mbox series

[0/5] Misc cleanups in balance code

Message ID 1540554201-11305-1-git-send-email-nborisov@suse.com (mailing list archive)
Headers show
Series Misc cleanups in balance code | expand

Message

Nikolay Borisov Oct. 26, 2018, 11:43 a.m. UTC
While investigating the balance hang I came across various inconsistencies in 
the source. This series aims to fix those. 

The first patch is (I believe) a fix to a longstanding bug that could cause 
balance to fail due to ENOSPC. The code no properly ensures that there is 
at least 1g of unallocated space on every device being balance.

Patch 2 makes btrfs_can_relocate a bit more obvious and removes leftovers from 
previous cleanup

Patches 3/4/5 remove some redundant code from various functions. 

This series has survived multiple xfstest runs. 

Nikolay Borisov (5):
  btrfs: Ensure at least 1g is free for balance
  btrfs: Refactor btrfs_can_relocate
  btrfs: Remove superfluous check form btrfs_remove_chunk
  btrfs: Sink find_lock_delalloc_range's 'max_bytes' argument
  btrfs: Replace BUG_ON with ASSERT in find_lock_delalloc_range

 fs/btrfs/ctree.h                 |  2 +-
 fs/btrfs/extent-tree.c           | 39 ++++++++++++++-------------------------
 fs/btrfs/extent_io.c             | 13 ++++++-------
 fs/btrfs/extent_io.h             |  2 +-
 fs/btrfs/tests/extent-io-tests.c | 10 +++++-----
 fs/btrfs/volumes.c               | 17 +++++++----------
 6 files changed, 34 insertions(+), 49 deletions(-)

Comments

David Sterba Nov. 16, 2018, 3:18 p.m. UTC | #1
On Fri, Oct 26, 2018 at 02:43:16PM +0300, Nikolay Borisov wrote:
> While investigating the balance hang I came across various inconsistencies in 
> the source. This series aims to fix those. 
> 
> The first patch is (I believe) a fix to a longstanding bug that could cause 
> balance to fail due to ENOSPC. The code no properly ensures that there is 
> at least 1g of unallocated space on every device being balance.
> 
> Patch 2 makes btrfs_can_relocate a bit more obvious and removes leftovers from 
> previous cleanup
> 
> Patches 3/4/5 remove some redundant code from various functions. 
> 
> This series has survived multiple xfstest runs. 
> 
> Nikolay Borisov (5):
>   btrfs: Ensure at least 1g is free for balance
>   btrfs: Refactor btrfs_can_relocate
>   btrfs: Remove superfluous check form btrfs_remove_chunk
>   btrfs: Sink find_lock_delalloc_range's 'max_bytes' argument
>   btrfs: Replace BUG_ON with ASSERT in find_lock_delalloc_range

Patches 2-5 on the way to misc-next, thanks. The first one can have user
visible consequences, so I'd rather first find out why the 1MB was there
and if it's really a bug and what exactly will change when it's 1G.
Nikolay Borisov Nov. 16, 2018, 3:36 p.m. UTC | #2
On 16.11.18 г. 17:18 ч., David Sterba wrote:
> On Fri, Oct 26, 2018 at 02:43:16PM +0300, Nikolay Borisov wrote:
>> While investigating the balance hang I came across various inconsistencies in 
>> the source. This series aims to fix those. 
>>
>> The first patch is (I believe) a fix to a longstanding bug that could cause 
>> balance to fail due to ENOSPC. The code no properly ensures that there is 
>> at least 1g of unallocated space on every device being balance.
>>
>> Patch 2 makes btrfs_can_relocate a bit more obvious and removes leftovers from 
>> previous cleanup
>>
>> Patches 3/4/5 remove some redundant code from various functions. 
>>
>> This series has survived multiple xfstest runs. 
>>
>> Nikolay Borisov (5):
>>   btrfs: Ensure at least 1g is free for balance
>>   btrfs: Refactor btrfs_can_relocate
>>   btrfs: Remove superfluous check form btrfs_remove_chunk
>>   btrfs: Sink find_lock_delalloc_range's 'max_bytes' argument
>>   btrfs: Replace BUG_ON with ASSERT in find_lock_delalloc_range
> 
> Patches 2-5 on the way to misc-next, thanks. The first one can have user
> visible consequences, so I'd rather first find out why the 1MB was there
> and if it's really a bug and what exactly will change when it's 1G.

I'm fine with that, one thing I don't agree with, though, is the
conclusion that patch 1 has user visible consequence. As a matter of
fact it does not. If btrfs_shrink_device fails with ENOSPC we just
break, i/e we don't try to free balance space for any of the other
devices, but this doesn't stop from balance actually continuing. As it
stands today I think "step one" in __btrfs_balance is more or less
null-op, i.e cycles are wasted since all we do is shrink every device by
1mb, ensuring one more mb is already free. I think this is very
insufficient for a balance operation and when it succeeds during normal
operation it's due to the device already having enough unallocated space
before the balance.

Anyway, I will speak with Chris to try and find out why the code uses 1mb

>