diff mbox series

[1/2] btrfs: fix assert during replace-cancel of suspended replace-operation

Message ID 3eb88dc3914667123ebb0823bbab9e07b24cf099.1660299977.git.anand.jain@oracle.com (mailing list archive)
State New, archived
Headers show
Series fix issues during suspended replace operation | expand

Commit Message

Anand Jain Aug. 12, 2022, 10:32 a.m. UTC
If the filesystem mounts with the replace-operation in a suspended state
and try to cancel the suspended replace-operation, we hit the assert. The
assert came from the commit fe97e2e173af ("btrfs: dev-replace: replace's
scrub must not be running in suspended state") that was actually not
required. So just remove it.

 $ mount /dev/sda5 /btrfs

    BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing
    BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded'

 $ mount -o degraded /dev/sda5 /btrfs <-- success.

 $ btrfs replace cancel /btrfs

    kernel: assertion failed: ret != -ENOTCONN, in fs/btrfs/dev-replace.c:1131
    kernel: ------------[ cut here ]------------
    kernel: kernel BUG at fs/btrfs/ctree.h:3750!

After the patch:

 $ btrfs replace cancel /btrfs

    BTRFS info (device sda5): suspended dev_replace from /dev/sda5 (devid 1) to <missing disk> canceled

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/dev-replace.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Filipe Manana Sept. 22, 2022, 10 a.m. UTC | #1
On Fri, Aug 12, 2022 at 11:56 AM Anand Jain <anand.jain@oracle.com> wrote:
>
> If the filesystem mounts with the replace-operation in a suspended state
> and try to cancel the suspended replace-operation, we hit the assert. The
> assert came from the commit fe97e2e173af ("btrfs: dev-replace: replace's
> scrub must not be running in suspended state") that was actually not
> required. So just remove it.
>
>  $ mount /dev/sda5 /btrfs
>
>     BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing
>     BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded'
>
>  $ mount -o degraded /dev/sda5 /btrfs <-- success.
>
>  $ btrfs replace cancel /btrfs
>
>     kernel: assertion failed: ret != -ENOTCONN, in fs/btrfs/dev-replace.c:1131
>     kernel: ------------[ cut here ]------------
>     kernel: kernel BUG at fs/btrfs/ctree.h:3750!
>
> After the patch:
>
>  $ btrfs replace cancel /btrfs
>
>     BTRFS info (device sda5): suspended dev_replace from /dev/sda5 (devid 1) to <missing disk> canceled

Anand, can you please add a test case to fstests?
This is a scenario with no coverage at all in fstests, therefore
specially useful to have it there.

Thanks.

>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>  fs/btrfs/dev-replace.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index 488f2105c5d0..9d46a702bc11 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -1124,8 +1124,7 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info)
>                 up_write(&dev_replace->rwsem);
>
>                 /* Scrub for replace must not be running in suspended state */
> -               ret = btrfs_scrub_cancel(fs_info);
> -               ASSERT(ret != -ENOTCONN);
> +               btrfs_scrub_cancel(fs_info);
>
>                 trans = btrfs_start_transaction(root, 0);
>                 if (IS_ERR(trans)) {
> --
> 2.33.1
>
Anand Jain Sept. 22, 2022, 11:28 a.m. UTC | #2
On 22/09/2022 18:00, Filipe Manana wrote:
> On Fri, Aug 12, 2022 at 11:56 AM Anand Jain <anand.jain@oracle.com> wrote:
>>
>> If the filesystem mounts with the replace-operation in a suspended state
>> and try to cancel the suspended replace-operation, we hit the assert. The
>> assert came from the commit fe97e2e173af ("btrfs: dev-replace: replace's
>> scrub must not be running in suspended state") that was actually not
>> required. So just remove it.
>>
>>   $ mount /dev/sda5 /btrfs
>>
>>      BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing
>>      BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded'
>>
>>   $ mount -o degraded /dev/sda5 /btrfs <-- success.
>>
>>   $ btrfs replace cancel /btrfs
>>
>>      kernel: assertion failed: ret != -ENOTCONN, in fs/btrfs/dev-replace.c:1131
>>      kernel: ------------[ cut here ]------------
>>      kernel: kernel BUG at fs/btrfs/ctree.h:3750!
>>
>> After the patch:
>>
>>   $ btrfs replace cancel /btrfs
>>
>>      BTRFS info (device sda5): suspended dev_replace from /dev/sda5 (devid 1) to <missing disk> canceled
> 
> Anand, can you please add a test case to fstests?
> This is a scenario with no coverage at all in fstests, therefore
> specially useful to have it there.
> 

I thought about it before and found that unless we implement the
replace-pause sub-command, we can't get a pending replace item in
an unmounted btrfs using a script.

However, to test it manually, I did an abrupt reboot (or power-off,
I think).

Thanks.

> Thanks.
> 
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>   fs/btrfs/dev-replace.c | 3 +--
>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
>> index 488f2105c5d0..9d46a702bc11 100644
>> --- a/fs/btrfs/dev-replace.c
>> +++ b/fs/btrfs/dev-replace.c
>> @@ -1124,8 +1124,7 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info)
>>                  up_write(&dev_replace->rwsem);
>>
>>                  /* Scrub for replace must not be running in suspended state */
>> -               ret = btrfs_scrub_cancel(fs_info);
>> -               ASSERT(ret != -ENOTCONN);
>> +               btrfs_scrub_cancel(fs_info);
>>
>>                  trans = btrfs_start_transaction(root, 0);
>>                  if (IS_ERR(trans)) {
>> --
>> 2.33.1
>>
> 
>
diff mbox series

Patch

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 488f2105c5d0..9d46a702bc11 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -1124,8 +1124,7 @@  int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info)
 		up_write(&dev_replace->rwsem);
 
 		/* Scrub for replace must not be running in suspended state */
-		ret = btrfs_scrub_cancel(fs_info);
-		ASSERT(ret != -ENOTCONN);
+		btrfs_scrub_cancel(fs_info);
 
 		trans = btrfs_start_transaction(root, 0);
 		if (IS_ERR(trans)) {