btrfs: check rw_devices, not num_devices for restriping
diff mbox series

Message ID 20200108223929.2761-1-josef@toxicpanda.com
State New
Headers show
Series
  • btrfs: check rw_devices, not num_devices for restriping
Related show

Commit Message

Josef Bacik Jan. 8, 2020, 10:39 p.m. UTC
While running xfstests with compression on I noticed I was panicing on
btrfs/154.  I bisected this down to my inc_block_group_ro patches, which
was strange.

What was happening is with my patches we now use btrfs_can_overcommit()
to see if we can flip a block group read only.  Before this would fail
because we weren't taking into account the usable un-allocated space for
allocating chunks.  With my patches we were allowed to do the balance,
which is technically correct.

However this test is testing restriping with a degraded mount, something
that isn't working right because Anand's fix for the test was never
actually merged.

So now we're trying to allocate a chunk and cannot because we want to
allocate a RAID1 chunk, but there's only 1 device that's available for
usage.  This results in an ENOSPC in one of the BUG_ON(ret) paths in
relocation (and a tricky path that is going to take many more patches to
fix.)

But we shouldn't even be making it this far, we don't have enough
devices to restripe.  The problem is we're using btrfs_num_devices(),
which for some reason includes missing devices.  That's not actually
what we want, we want the rw_devices.

Fix this by getting the rw_devices.  With this patch we're no longer
panicing with my other patches applied, and we're in fact erroring out
at the correct spot instead of at inc_block_group_ro.  The fact that
this was working before was just sheer dumb luck.

Fixes: e4d8ec0f65b9 ("Btrfs: implement online profile changing")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/volumes.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Nikolay Borisov Jan. 9, 2020, 11:01 a.m. UTC | #1
On 9.01.20 г. 0:39 ч., Josef Bacik wrote:
> While running xfstests with compression on I noticed I was panicing on
> btrfs/154.  I bisected this down to my inc_block_group_ro patches, which
> was strange.
> 
> What was happening is with my patches we now use btrfs_can_overcommit()
> to see if we can flip a block group read only.  Before this would fail
> because we weren't taking into account the usable un-allocated space for
> allocating chunks.  With my patches we were allowed to do the balance,
> which is technically correct.
> 
> However this test is testing restriping with a degraded mount, something
> that isn't working right because Anand's fix for the test was never
> actually merged.
> 
> So now we're trying to allocate a chunk and cannot because we want to
> allocate a RAID1 chunk, but there's only 1 device that's available for
> usage.  This results in an ENOSPC in one of the BUG_ON(ret) paths in
> relocation (and a tricky path that is going to take many more patches to
> fix.)
> 
> But we shouldn't even be making it this far, we don't have enough
> devices to restripe.  The problem is we're using btrfs_num_devices(),
> which for some reason includes missing devices.  That's not actually
> what we want, we want the rw_devices.
> 
> Fix this by getting the rw_devices.  With this patch we're no longer
> panicing with my other patches applied, and we're in fact erroring out
> at the correct spot instead of at inc_block_group_ro.  The fact that
> this was working before was just sheer dumb luck.
> 
> Fixes: e4d8ec0f65b9 ("Btrfs: implement online profile changing")
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/volumes.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 7483521a928b..ee4d440e544e 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3881,7 +3881,13 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
>  		}
>  	}
>  
> -	num_devices = btrfs_num_devices(fs_info);
> +	/*
> +	 * device replace only adjusts rw_devices when it is finishing, so take
> +	 * the lock here to make sure we get the right value for rw_devices.
> +	 */
> +	down_read(&fs_info->dev_replace.rwsem);
> +	num_devices = fs_info->fs_devices->rw_devices;
> +	up_read(&fs_info->dev_replace.rwsem);

rw_devices is modified under a myriad of locks, to name a few:

under chunk/device_list mutex in init_new_device

under device_lsit_mutex  in
btrfs_open_devices->open_fs_devices->btrfs_open_one_device

uuid mutex in btrfs_rm_device and also under chunk_mutex in an error
path in the same function.

device_list_mutex in btrfs_rm_dev_replace_remove_srcdev


SO you are only protecting from the replace context in this case. Is
this sufficient?


>  
>  	/*
>  	 * SINGLE profile on-disk has no profile bit, but in-memory we have a
>
Josef Bacik Jan. 9, 2020, 2:12 p.m. UTC | #2
On 1/9/20 6:01 AM, Nikolay Borisov wrote:
> 
> 
> On 9.01.20 г. 0:39 ч., Josef Bacik wrote:
>> While running xfstests with compression on I noticed I was panicing on
>> btrfs/154.  I bisected this down to my inc_block_group_ro patches, which
>> was strange.
>>
>> What was happening is with my patches we now use btrfs_can_overcommit()
>> to see if we can flip a block group read only.  Before this would fail
>> because we weren't taking into account the usable un-allocated space for
>> allocating chunks.  With my patches we were allowed to do the balance,
>> which is technically correct.
>>
>> However this test is testing restriping with a degraded mount, something
>> that isn't working right because Anand's fix for the test was never
>> actually merged.
>>
>> So now we're trying to allocate a chunk and cannot because we want to
>> allocate a RAID1 chunk, but there's only 1 device that's available for
>> usage.  This results in an ENOSPC in one of the BUG_ON(ret) paths in
>> relocation (and a tricky path that is going to take many more patches to
>> fix.)
>>
>> But we shouldn't even be making it this far, we don't have enough
>> devices to restripe.  The problem is we're using btrfs_num_devices(),
>> which for some reason includes missing devices.  That's not actually
>> what we want, we want the rw_devices.
>>
>> Fix this by getting the rw_devices.  With this patch we're no longer
>> panicing with my other patches applied, and we're in fact erroring out
>> at the correct spot instead of at inc_block_group_ro.  The fact that
>> this was working before was just sheer dumb luck.
>>
>> Fixes: e4d8ec0f65b9 ("Btrfs: implement online profile changing")
>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>> ---
>>   fs/btrfs/volumes.c | 8 +++++++-
>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 7483521a928b..ee4d440e544e 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -3881,7 +3881,13 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
>>   		}
>>   	}
>>   
>> -	num_devices = btrfs_num_devices(fs_info);
>> +	/*
>> +	 * device replace only adjusts rw_devices when it is finishing, so take
>> +	 * the lock here to make sure we get the right value for rw_devices.
>> +	 */
>> +	down_read(&fs_info->dev_replace.rwsem);
>> +	num_devices = fs_info->fs_devices->rw_devices;
>> +	up_read(&fs_info->dev_replace.rwsem);
> 
> rw_devices is modified under a myriad of locks, to name a few:
> 
> under chunk/device_list mutex in init_new_device
> 
> under device_lsit_mutex  in
> btrfs_open_devices->open_fs_devices->btrfs_open_one_device
> 
> uuid mutex in btrfs_rm_device and also under chunk_mutex in an error
> path in the same function.
> 
> device_list_mutex in btrfs_rm_dev_replace_remove_srcdev
> 
> 
> SO you are only protecting from the replace context in this case. Is
> this sufficient?
> 

Hmm I thought the replace_remove_srcdev was under the dev_replace rwsem but it 
looks like it's only under all the other locks.  We're only worried about it 
being messed with while the file system is mounted, and it appears we're at 
least consistent with taking the chunk_mutex when modifying it while the fs is 
mounted, I'll switch it to that.  Thanks,

Josef

Patch
diff mbox series

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 7483521a928b..ee4d440e544e 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3881,7 +3881,13 @@  int btrfs_balance(struct btrfs_fs_info *fs_info,
 		}
 	}
 
-	num_devices = btrfs_num_devices(fs_info);
+	/*
+	 * device replace only adjusts rw_devices when it is finishing, so take
+	 * the lock here to make sure we get the right value for rw_devices.
+	 */
+	down_read(&fs_info->dev_replace.rwsem);
+	num_devices = fs_info->fs_devices->rw_devices;
+	up_read(&fs_info->dev_replace.rwsem);
 
 	/*
 	 * SINGLE profile on-disk has no profile bit, but in-memory we have a