btrfs: fix error handling in btrfs_dev_replace_start
diff mbox series

Message ID 20180906195217.12163-1-jeffm@suse.com
State New
Headers show
Series
  • btrfs: fix error handling in btrfs_dev_replace_start
Related show

Commit Message

Jeff Mahoney Sept. 6, 2018, 7:52 p.m. UTC
From: Jeff Mahoney <jeffm@suse.com>

When we fail to start a transaction in btrfs_dev_replace_start,
we leave dev_replace->replace_start set to STARTED but clear
->srcdev and ->tgtdev.  Later, that can result in an Oops in
btrfs_dev_replace_progress when having state set to STARTED or
SUSPENDED implies that ->srcdev is valid.

Also fix error handling when the state is already STARTED or
SUSPENDED while starting.  That, too, will clear ->srcdev and ->tgtdev
even though it doesn't own them.  This should be an impossible case to
hit since we should be protected by the BTRFS_FS_EXCL_OP bit being set.
Let's add an ASSERT there while we're at it.

Fixes: e93c89c1aaaaa (Btrfs: add new sources for device replace code)
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
---
 fs/btrfs/dev-replace.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

David Sterba Sept. 10, 2018, 5:47 p.m. UTC | #1
On Thu, Sep 06, 2018 at 03:52:17PM -0400, jeffm@suse.com wrote:
> From: Jeff Mahoney <jeffm@suse.com>
> 
> When we fail to start a transaction in btrfs_dev_replace_start,
> we leave dev_replace->replace_start set to STARTED but clear
> ->srcdev and ->tgtdev.  Later, that can result in an Oops in
> btrfs_dev_replace_progress when having state set to STARTED or
> SUSPENDED implies that ->srcdev is valid.
> 
> Also fix error handling when the state is already STARTED or
> SUSPENDED while starting.  That, too, will clear ->srcdev and ->tgtdev
> even though it doesn't own them.  This should be an impossible case to
> hit since we should be protected by the BTRFS_FS_EXCL_OP bit being set.
> Let's add an ASSERT there while we're at it.
> 
> Fixes: e93c89c1aaaaa (Btrfs: add new sources for device replace code)
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> ---
>  fs/btrfs/dev-replace.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index e2ba0419297a..0581c8570a05 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -445,6 +445,7 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
>  		break;
>  	case BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED:
>  	case BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED:
> +		ASSERT(0);

I don't like it fixed that way. The asserts get compiled in
conditionally, so the behaviour might be differend dending on the
config. If the cases are unreachable, then the two cases can be replaced
by a default: and return EINVAL.

Otherwise ok.
David Sterba Sept. 14, 2018, 3:38 p.m. UTC | #2
On Mon, Sep 10, 2018 at 07:47:29PM +0200, David Sterba wrote:
> On Thu, Sep 06, 2018 at 03:52:17PM -0400, jeffm@suse.com wrote:
> > From: Jeff Mahoney <jeffm@suse.com>
> > 
> > When we fail to start a transaction in btrfs_dev_replace_start,
> > we leave dev_replace->replace_start set to STARTED but clear
> > ->srcdev and ->tgtdev.  Later, that can result in an Oops in
> > btrfs_dev_replace_progress when having state set to STARTED or
> > SUSPENDED implies that ->srcdev is valid.
> > 
> > Also fix error handling when the state is already STARTED or
> > SUSPENDED while starting.  That, too, will clear ->srcdev and ->tgtdev
> > even though it doesn't own them.  This should be an impossible case to
> > hit since we should be protected by the BTRFS_FS_EXCL_OP bit being set.
> > Let's add an ASSERT there while we're at it.
> > 
> > Fixes: e93c89c1aaaaa (Btrfs: add new sources for device replace code)
> > Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> > ---
> >  fs/btrfs/dev-replace.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> > index e2ba0419297a..0581c8570a05 100644
> > --- a/fs/btrfs/dev-replace.c
> > +++ b/fs/btrfs/dev-replace.c
> > @@ -445,6 +445,7 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
> >  		break;
> >  	case BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED:
> >  	case BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED:
> > +		ASSERT(0);
> 
> I don't like it fixed that way. The asserts get compiled in
> conditionally, so the behaviour might be differend dending on the
> config. If the cases are unreachable, then the two cases can be replaced
> by a default: and return EINVAL.

As this is a minor issue and a matter of cleanup, I'm going to merge the
patch as-is because it's a bugfix.

Patch
diff mbox series

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index e2ba0419297a..0581c8570a05 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -445,6 +445,7 @@  int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
 		break;
 	case BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED:
 	case BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED:
+		ASSERT(0);
 		ret = BTRFS_IOCTL_DEV_REPLACE_RESULT_ALREADY_STARTED;
 		goto leave;
 	}
@@ -487,6 +488,10 @@  int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
 	if (IS_ERR(trans)) {
 		ret = PTR_ERR(trans);
 		btrfs_dev_replace_write_lock(dev_replace);
+		dev_replace->replace_state =
+			BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED;
+		dev_replace->srcdev = NULL;
+		dev_replace->tgtdev = NULL;
 		goto leave;
 	}
 
@@ -508,8 +513,6 @@  int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
 	return ret;
 
 leave:
-	dev_replace->srcdev = NULL;
-	dev_replace->tgtdev = NULL;
 	btrfs_dev_replace_write_unlock(dev_replace);
 	btrfs_destroy_dev_replace_tgtdev(fs_info, tgt_device);
 	return ret;