diff mbox series

btrfs: fix return value when race occur between balance and cancel/pause

Message ID 20230810034810.23934-1-xiaoshoukui@gmail.com (mailing list archive)
State New, archived
Headers show
Series btrfs: fix return value when race occur between balance and cancel/pause | expand

Commit Message

xiaoshoukui Aug. 10, 2023, 3:48 a.m. UTC
Issue a pause or cancel IOCTL request after judging that there is no
pause or cancel request on the path of __btrfs_balance to return 0,
which will mislead the user that the pause or cancel requests are
successful.In fact, the balance request has not been paused or canceled.

On that race condition, a non-zero errno should be returned to the user.

Signed-off-by: xiaoshoukui <xiaoshoukui@ruijie.com.cn>
---
 fs/btrfs/fs.h      |  6 ++++++
 fs/btrfs/volumes.c | 14 +++++++++-----
 2 files changed, 15 insertions(+), 5 deletions(-)

Comments

David Sterba Aug. 10, 2023, 12:05 p.m. UTC | #1
On Wed, Aug 09, 2023 at 11:48:10PM -0400, xiaoshoukui wrote:
> Issue a pause or cancel IOCTL request after judging that there is no
> pause or cancel request on the path of __btrfs_balance to return 0,
> which will mislead the user that the pause or cancel requests are
> successful.In fact, the balance request has not been paused or canceled.
> 
> On that race condition, a non-zero errno should be returned to the user.
> 
> Signed-off-by: xiaoshoukui <xiaoshoukui@ruijie.com.cn>
> ---
>  fs/btrfs/fs.h      |  6 ++++++
>  fs/btrfs/volumes.c | 14 +++++++++-----
>  2 files changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
> index 203d2a267828..c27def881922 100644
> --- a/fs/btrfs/fs.h
> +++ b/fs/btrfs/fs.h
> @@ -93,6 +93,12 @@ enum {
>  	 */
>  	BTRFS_FS_BALANCE_RUNNING,
>  
> +	/* Indicate that balance has been paused. */
> +	BTRFS_FS_BALANCE_PAUSED,
> +
> +	/* Indicate that balance has been canceled. */
> +	BTRFS_FS_BALANCE_CANCELED,

I don't like that the status is tracked in several bits like that, in
addition to the already complicated locking and state transitions of
restarted balance. I think this is a hint that some things can be
simplified or combined together, though it could be difficult
xiaoshoukui Aug. 11, 2023, 2:35 a.m. UTC | #2
The first thought to solve the problem was to use locks, but after practice,
it turn it out that this would made the original code even more complex.

The way of tracking status may just a workaround solution. The better solution
may is to refactor balance relevant code.

I think interface provided to the user is very important for reliability.
Looking forward to a better solution, If needed, I can take some effort
for testing and reproducing.
diff mbox series

Patch

diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index 203d2a267828..c27def881922 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -93,6 +93,12 @@  enum {
 	 */
 	BTRFS_FS_BALANCE_RUNNING,
 
+	/* Indicate that balance has been paused. */
+	BTRFS_FS_BALANCE_PAUSED,
+
+	/* Indicate that balance has been canceled. */
+	BTRFS_FS_BALANCE_CANCELED,
+
 	/*
 	 * Indicate that relocation of a chunk has started, it's set per chunk
 	 * and is toggled between chunks.
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 2ecb76cf3d91..839ce1808f23 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4267,7 +4267,6 @@  int btrfs_balance(struct btrfs_fs_info *fs_info,
 	u64 num_devices;
 	unsigned seq;
 	bool reducing_redundancy;
-	bool paused = false;
 	int i;
 
 	if (btrfs_fs_closing(fs_info) ||
@@ -4390,6 +4389,8 @@  int btrfs_balance(struct btrfs_fs_info *fs_info,
 	ASSERT(!test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
 	set_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags);
 	describe_balance_start_or_resume(fs_info);
+	clear_bit(BTRFS_FS_BALANCE_PAUSED, &fs_info->flags);
+	clear_bit(BTRFS_FS_BALANCE_CANCELED, &fs_info->flags);
 	mutex_unlock(&fs_info->balance_mutex);
 
 	ret = __btrfs_balance(fs_info);
@@ -4398,7 +4399,7 @@  int btrfs_balance(struct btrfs_fs_info *fs_info,
 	if (ret == -ECANCELED && atomic_read(&fs_info->balance_pause_req)) {
 		btrfs_info(fs_info, "balance: paused");
 		btrfs_exclop_balance(fs_info, BTRFS_EXCLOP_BALANCE_PAUSED);
-		paused = true;
+		set_bit(BTRFS_FS_BALANCE_PAUSED, &fs_info->flags);
 	}
 	/*
 	 * Balance can be canceled by:
@@ -4415,8 +4416,10 @@  int btrfs_balance(struct btrfs_fs_info *fs_info,
 	 *
 	 * So here we only check the return value to catch canceled balance.
 	 */
-	else if (ret == -ECANCELED || ret == -EINTR)
+	else if (ret == -ECANCELED || ret == -EINTR) {
 		btrfs_info(fs_info, "balance: canceled");
+		set_bit(BTRFS_FS_BALANCE_CANCELED, &fs_info->flags);
+	}
 	else
 		btrfs_info(fs_info, "balance: ended with status: %d", ret);
 
@@ -4428,7 +4431,7 @@  int btrfs_balance(struct btrfs_fs_info *fs_info,
 	}
 
 	/* We didn't pause, we can clean everything up. */
-	if (!paused) {
+	if (!test_bit(BTRFS_FS_BALANCE_PAUSED, &fs_info->flags)) {
 		reset_balance_state(fs_info);
 		btrfs_exclop_finish(fs_info);
 	}
@@ -4587,6 +4590,7 @@  int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
 		/* we are good with balance_ctl ripped off from under us */
 		BUG_ON(test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
 		atomic_dec(&fs_info->balance_pause_req);
+		ret = test_bit(BTRFS_FS_BALANCE_PAUSED, &fs_info->flags) ? 0 : -EINVAL;
 	} else {
 		ret = -ENOTCONN;
 	}
@@ -4642,7 +4646,7 @@  int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
 		test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
 	atomic_dec(&fs_info->balance_cancel_req);
 	mutex_unlock(&fs_info->balance_mutex);
-	return 0;
+	return test_bit(BTRFS_FS_BALANCE_CANCELED, &fs_info->flags) ? 0 : -EINVAL;
 }
 
 int btrfs_uuid_scan_kthread(void *data)