Message ID | 1541946144-8174-8-git-send-email-anand.jain@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fix replace-start and replace-cancel racing | expand |
On Sun, Nov 11, 2018 at 10:22:22PM +0800, Anand Jain wrote: > When we successfully cancel the replace its scrub returns -ECANCELED, > which then passed to btrfs_dev_replace_finishing(), it cleans up based > on the scrub returned status and propagates the same -ECANCELED back > the parent function. As of now only user can cancel the replace-scrub, > so its ok to quieten the warn here. > > Signed-off-by: Anand Jain <anand.jain@oracle.com> > --- > fs/btrfs/dev-replace.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c > index 1dc8e86546db..9031a362921a 100644 > --- a/fs/btrfs/dev-replace.c > +++ b/fs/btrfs/dev-replace.c > @@ -497,7 +497,7 @@ static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info, > ret = btrfs_dev_replace_finishing(fs_info, ret); > if (ret == -EINPROGRESS) { > ret = BTRFS_IOCTL_DEV_REPLACE_RESULT_SCRUB_INPROGRESS; > - } else { > + } else if (ret != -ECANCELED) { > WARN_ON(ret); While this looks ok, can you please rework it so there are no WARN_ON at random places in device-replace, poorly substituting error handling? The code flow in this case could be changed to make explicit checks for the know codes and then a catch-all branch like: if (ret == -EINPROGRESS) { ... } else (if == -ESOMETHINGELSE) { ... } else { unknown error, print error and do a proper cleanup } > } > > @@ -954,7 +954,7 @@ static int btrfs_dev_replace_kthread(void *data) > btrfs_device_get_total_bytes(dev_replace->srcdev), > &dev_replace->scrub_progress, 0, 1); > ret = btrfs_dev_replace_finishing(fs_info, ret); > - WARN_ON(ret); > + WARN_ON(ret && ret != -ECANCELED); This one too, thanks.
On 11/15/2018 11:35 PM, David Sterba wrote: > On Sun, Nov 11, 2018 at 10:22:22PM +0800, Anand Jain wrote: >> When we successfully cancel the replace its scrub returns -ECANCELED, >> which then passed to btrfs_dev_replace_finishing(), it cleans up based >> on the scrub returned status and propagates the same -ECANCELED back >> the parent function. As of now only user can cancel the replace-scrub, >> so its ok to quieten the warn here. >> >> Signed-off-by: Anand Jain <anand.jain@oracle.com> >> --- >> fs/btrfs/dev-replace.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c >> index 1dc8e86546db..9031a362921a 100644 >> --- a/fs/btrfs/dev-replace.c >> +++ b/fs/btrfs/dev-replace.c >> @@ -497,7 +497,7 @@ static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info, >> ret = btrfs_dev_replace_finishing(fs_info, ret); >> if (ret == -EINPROGRESS) { >> ret = BTRFS_IOCTL_DEV_REPLACE_RESULT_SCRUB_INPROGRESS; >> - } else { >> + } else if (ret != -ECANCELED) { >> WARN_ON(ret); > > While this looks ok, can you please rework it so there are no WARN_ON at > random places in device-replace, poorly substituting error handling? > > The code flow in this case could be changed to make explicit checks for > the know codes and then a catch-all branch like: > > if (ret == -EINPROGRESS) { > ... > } else (if == -ESOMETHINGELSE) { > ... > } else { > unknown error, print error and do a proper cleanup > } > As below.. >> } >> >> @@ -954,7 +954,7 @@ static int btrfs_dev_replace_kthread(void *data) >> btrfs_device_get_total_bytes(dev_replace->srcdev), >> &dev_replace->scrub_progress, 0, 1); >> ret = btrfs_dev_replace_finishing(fs_info, ret); >> - WARN_ON(ret); >> + WARN_ON(ret && ret != -ECANCELED); > > This one too, thanks. btrfs_dev_scrub() can return quite a lot of errno, which is passed here through the btrfs_dev_replace_finishing(), so it won't be possible to code them all. (we use -ECANCELED only in replace and balance). Thanks, Anand
On Fri, Nov 16, 2018 at 08:06:36PM +0800, Anand Jain wrote: > > > On 11/15/2018 11:35 PM, David Sterba wrote: > > On Sun, Nov 11, 2018 at 10:22:22PM +0800, Anand Jain wrote: > >> When we successfully cancel the replace its scrub returns -ECANCELED, > >> which then passed to btrfs_dev_replace_finishing(), it cleans up based > >> on the scrub returned status and propagates the same -ECANCELED back > >> the parent function. As of now only user can cancel the replace-scrub, > >> so its ok to quieten the warn here. > >> > >> Signed-off-by: Anand Jain <anand.jain@oracle.com> > >> --- > >> fs/btrfs/dev-replace.c | 4 ++-- > >> 1 file changed, 2 insertions(+), 2 deletions(-) > >> > >> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c > >> index 1dc8e86546db..9031a362921a 100644 > >> --- a/fs/btrfs/dev-replace.c > >> +++ b/fs/btrfs/dev-replace.c > >> @@ -497,7 +497,7 @@ static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info, > >> ret = btrfs_dev_replace_finishing(fs_info, ret); > >> if (ret == -EINPROGRESS) { > >> ret = BTRFS_IOCTL_DEV_REPLACE_RESULT_SCRUB_INPROGRESS; > >> - } else { > >> + } else if (ret != -ECANCELED) { > >> WARN_ON(ret); > > > > While this looks ok, can you please rework it so there are no WARN_ON at > > random places in device-replace, poorly substituting error handling? > > > > The code flow in this case could be changed to make explicit checks for > > the know codes and then a catch-all branch like: > > > > if (ret == -EINPROGRESS) { > > ... > > } else (if == -ESOMETHINGELSE) { > > ... > > } else { > > unknown error, print error and do a proper cleanup > > } > > > > As below.. > > >> } > >> > >> @@ -954,7 +954,7 @@ static int btrfs_dev_replace_kthread(void *data) > >> btrfs_device_get_total_bytes(dev_replace->srcdev), > >> &dev_replace->scrub_progress, 0, 1); > >> ret = btrfs_dev_replace_finishing(fs_info, ret); > >> - WARN_ON(ret); > >> + WARN_ON(ret && ret != -ECANCELED); > > > > This one too, thanks. > > > btrfs_dev_scrub() can return quite a lot of errno, which is passed > here through the btrfs_dev_replace_finishing(), so it won't be > possible to code them all. > > (we use -ECANCELED only in replace and balance). I see, filtering out only the replace error codes makes more sense.
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 1dc8e86546db..9031a362921a 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -497,7 +497,7 @@ static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info, ret = btrfs_dev_replace_finishing(fs_info, ret); if (ret == -EINPROGRESS) { ret = BTRFS_IOCTL_DEV_REPLACE_RESULT_SCRUB_INPROGRESS; - } else { + } else if (ret != -ECANCELED) { WARN_ON(ret); } @@ -954,7 +954,7 @@ static int btrfs_dev_replace_kthread(void *data) btrfs_device_get_total_bytes(dev_replace->srcdev), &dev_replace->scrub_progress, 0, 1); ret = btrfs_dev_replace_finishing(fs_info, ret); - WARN_ON(ret); + WARN_ON(ret && ret != -ECANCELED); clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags); return 0;
When we successfully cancel the replace its scrub returns -ECANCELED, which then passed to btrfs_dev_replace_finishing(), it cleans up based on the scrub returned status and propagates the same -ECANCELED back the parent function. As of now only user can cancel the replace-scrub, so its ok to quieten the warn here. Signed-off-by: Anand Jain <anand.jain@oracle.com> --- fs/btrfs/dev-replace.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)