diff mbox series

btrfs: Don't set balance as a running exclusive op in case of skip_balance

Message ID 20211027135334.19880-1-nborisov@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: Don't set balance as a running exclusive op in case of skip_balance | expand

Commit Message

Nikolay Borisov Oct. 27, 2021, 1:53 p.m. UTC
Currently balance is set as a running exclusive op in
btrfs_recover_balance in case of remount and a paused balance. That's a
bit eager because btrfs_recover_balance executes always and is not
affected by the 'skip_balance' mount option. This can lead to cases in
which a user has mounted the filesystem with 'skip_balance' due to
running out of space yet is unable to add a device to rectify the ENOSPC
because balance is set as a running exclusive op.

Fix this by setting balance in btrfs_resume_balance_async which takes
into consideration whether 'skip_balance' has been added or not.

Fixes:
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---

I'm inclined to put a Fixes: ed0fb78fb6aa ("Btrfs: bring back balance pause/resume logic")
tag since this is the commit that moved the exclusive op tracking of
resume_balance_async to btrfs_recover_balance. However that's far too back in the 
past and given the commit message of that commit I wonder if doing this 
re-arrangement in older kernels is correct. David what's your take on this, 
perhahps just a stable tag would be sufficient? 


 fs/btrfs/volumes.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

Comments

Anand Jain Oct. 28, 2021, 7:57 a.m. UTC | #1
On 27/10/2021 21:53, Nikolay Borisov wrote:
> Currently balance is set as a running exclusive op in
> btrfs_recover_balance in case of remount and a paused balance. That's a
> bit eager because btrfs_recover_balance executes always and is not
> affected by the 'skip_balance' mount option. This can lead to cases in
> which a user has mounted the filesystem with 'skip_balance' due to
> running out of space yet is unable to add a device to rectify the ENOSPC
> because balance is set as a running exclusive op.
> 
> Fix this by setting balance in btrfs_resume_balance_async which takes
> into consideration whether 'skip_balance' has been added or not.
> 

Hmm, no. I roughly remember it was purposefully to avoid replacing
intervened with the half-completed/paused balance. As below.
Not sure if it is ok now?

Before patch: Can't replace with a paused balance.

------------
$ btrfs bal start --full-balance /btrfs
balance paused by user

$ btrfs rep start /dev/vg/scratch1 /dev/vg/scratch0 /btrfs
ERROR: unable to start device replace, another exclusive operation 
'balance' in progress

$ mount -o remount,ro /dev/vg/scratch1 /btrfs
$ mount -o remount,rw,skip_balance /dev/vg/scratch1 /btrfs

$ btrfs rep start /dev/vg/scratch1 /dev/vg/scratch0 /btrfs
ERROR: unable to start device replace, another exclusive operation 
'balance' in progress

$ umount /btrfs
$ mount -o skip_balance /dev/vg/scratch1 /btrfs

$ btrfs rep start /dev/vg/scratch1 /dev/vg/scratch0 /btrfs
ERROR: unable to start device replace, another exclusive operation 
'balance' in progress
------------


After patch: Replace is successful even if there is a paused balance.

------------
$ mount -o skip_balance /dev/vg/scratch1 /btrfs

[ 1367.567379] BTRFS info (device dm-1): balance: resume skipped

$ btrfs rep start /dev/vg/scratch1 /dev/vg/scratch0 /btrfs   <----

[ 1391.318192] BTRFS info (device dm-1): dev_replace from 
/dev/mapper/vg-scratch1 (devid 1) to /dev/mapper/vg-scratch0 started
[ 1408.243475] BTRFS info (device dm-1): dev_replace from 
/dev/mapper/vg-scratch1 (devid 1) to /dev/mapper/vg-scratch0 finished


$ btrfs bal resume /btrfs
Done, had to relocate 5 out of 12 chunks
------------

Thanks, Anand


> Fixes:
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
> 
> I'm inclined to put a Fixes: ed0fb78fb6aa ("Btrfs: bring back balance pause/resume logic")
> tag since this is the commit that moved the exclusive op tracking of
> resume_balance_async to btrfs_recover_balance. However that's far too back in the
> past and given the commit message of that commit I wonder if doing this
> re-arrangement in older kernels is correct. David what's your take on this,
> perhahps just a stable tag would be sufficient?
> 
> 
>   fs/btrfs/volumes.c | 28 ++++++++++++++--------------
>   1 file changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 546bf1146b2d..bff52fa1b255 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4432,6 +4432,20 @@ int btrfs_resume_balance_async(struct btrfs_fs_info *fs_info)
>   		return 0;
>   	}
>   
> +	/*
> +	 * This should never happen, as the paused balance state is recovered
> +	 * during mount without any chance of other exclusive ops to collide.
> +	 *
> +	 * This gives the exclusive op status to balance and keeps in paused
> +	 * state until user intervention (cancel or umount). If the ownership
> +	 * cannot be assigned, show a message but do not fail. The balance
> +	 * is in a paused state and must have fs_info::balance_ctl properly
> +	 * set up.
> +	 */
> +	if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE))
> +		btrfs_warn(fs_info,
> +	"balance: cannot set exclusive op status, resume manually");
> +
>   	/*
>   	 * A ro->rw remount sequence should continue with the paused balance
>   	 * regardless of who pauses it, system or the user as of now, so set
> @@ -4490,20 +4504,6 @@ int btrfs_recover_balance(struct btrfs_fs_info *fs_info)
>   	btrfs_balance_sys(leaf, item, &disk_bargs);
>   	btrfs_disk_balance_args_to_cpu(&bctl->sys, &disk_bargs);
>   
> -	/*
> -	 * This should never happen, as the paused balance state is recovered
> -	 * during mount without any chance of other exclusive ops to collide.
> -	 *
> -	 * This gives the exclusive op status to balance and keeps in paused
> -	 * state until user intervention (cancel or umount). If the ownership
> -	 * cannot be assigned, show a message but do not fail. The balance
> -	 * is in a paused state and must have fs_info::balance_ctl properly
> -	 * set up.
> -	 */
> -	if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE))
> -		btrfs_warn(fs_info,
> -	"balance: cannot set exclusive op status, resume manually");
> -
>   	btrfs_release_path(path);
>   
>   	mutex_lock(&fs_info->balance_mutex);
>
Nikolay Borisov Oct. 28, 2021, 8:30 a.m. UTC | #2
On 28.10.21 г. 10:57, Anand Jain wrote:
> On 27/10/2021 21:53, Nikolay Borisov wrote:
>> Currently balance is set as a running exclusive op in
>> btrfs_recover_balance in case of remount and a paused balance. That's a
>> bit eager because btrfs_recover_balance executes always and is not
>> affected by the 'skip_balance' mount option. This can lead to cases in
>> which a user has mounted the filesystem with 'skip_balance' due to
>> running out of space yet is unable to add a device to rectify the ENOSPC
>> because balance is set as a running exclusive op.
>>
>> Fix this by setting balance in btrfs_resume_balance_async which takes
>> into consideration whether 'skip_balance' has been added or not.
>>
> 
> Hmm, no. I roughly remember it was purposefully to avoid replacing
> intervened with the half-completed/paused balance. As below.
> Not sure if it is ok now?
> 
> Before patch: Can't replace with a paused balance.
> 
> ------------
> $ btrfs bal start --full-balance /btrfs
> balance paused by user
> 
> $ btrfs rep start /dev/vg/scratch1 /dev/vg/scratch0 /btrfs
> ERROR: unable to start device replace, another exclusive operation
> 'balance' in progress
> 
> $ mount -o remount,ro /dev/vg/scratch1 /btrfs
> $ mount -o remount,rw,skip_balance /dev/vg/scratch1 /btrfs
> 
> $ btrfs rep start /dev/vg/scratch1 /dev/vg/scratch0 /btrfs
> ERROR: unable to start device replace, another exclusive operation
> 'balance' in progress
> 
> $ umount /btrfs
> $ mount -o skip_balance /dev/vg/scratch1 /btrfs
> 
> $ btrfs rep start /dev/vg/scratch1 /dev/vg/scratch0 /btrfs
> ERROR: unable to start device replace, another exclusive operation
> 'balance' in progress
> ------------
> 
> 
> After patch: Replace is successful even if there is a paused balance.

That's a fair point, so the patch needs to be augmented such that only
ADD is allowed but other exclusive ops are not. I will look into it.

<snip>
diff mbox series

Patch

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 546bf1146b2d..bff52fa1b255 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4432,6 +4432,20 @@  int btrfs_resume_balance_async(struct btrfs_fs_info *fs_info)
 		return 0;
 	}
 
+	/*
+	 * This should never happen, as the paused balance state is recovered
+	 * during mount without any chance of other exclusive ops to collide.
+	 *
+	 * This gives the exclusive op status to balance and keeps in paused
+	 * state until user intervention (cancel or umount). If the ownership
+	 * cannot be assigned, show a message but do not fail. The balance
+	 * is in a paused state and must have fs_info::balance_ctl properly
+	 * set up.
+	 */
+	if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE))
+		btrfs_warn(fs_info,
+	"balance: cannot set exclusive op status, resume manually");
+
 	/*
 	 * A ro->rw remount sequence should continue with the paused balance
 	 * regardless of who pauses it, system or the user as of now, so set
@@ -4490,20 +4504,6 @@  int btrfs_recover_balance(struct btrfs_fs_info *fs_info)
 	btrfs_balance_sys(leaf, item, &disk_bargs);
 	btrfs_disk_balance_args_to_cpu(&bctl->sys, &disk_bargs);
 
-	/*
-	 * This should never happen, as the paused balance state is recovered
-	 * during mount without any chance of other exclusive ops to collide.
-	 *
-	 * This gives the exclusive op status to balance and keeps in paused
-	 * state until user intervention (cancel or umount). If the ownership
-	 * cannot be assigned, show a message but do not fail. The balance
-	 * is in a paused state and must have fs_info::balance_ctl properly
-	 * set up.
-	 */
-	if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE))
-		btrfs_warn(fs_info,
-	"balance: cannot set exclusive op status, resume manually");
-
 	btrfs_release_path(path);
 
 	mutex_lock(&fs_info->balance_mutex);