[v2] btrfs: Add debug warning for new block group reservations
diff mbox

Message ID 20160526070608.10837-1-quwenruo@cn.fujitsu.com
State New
Headers show

Commit Message

Qu Wenruo May 26, 2016, 7:06 a.m. UTC
Since the we are using atomic and wait queue for block group
reservations and it's not controlled by lockdep, we need pay much more
attention to any modification to write path.

Or it's very easy to under flow block group reservations and cause lock
balance.

Add warning on for dec_block_group_reservations() if the reservations is
already minus.

Although such warning doesn't always catch the directly caller, but should
provides good enough clue for later debug.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
   Fix compile error
   Change to WARN_ON_ONCE(), as when it underflows, it will always under
   flow
---
 fs/btrfs/extent-tree.c | 1 +
 1 file changed, 1 insertion(+)

Comments

sri May 26, 2016, 8:47 a.m. UTC | #1
In traditional file systems such as ext3/ext4 when a snapshot (say with 
LVM2) is taken, all I/O are frozen and entire file system including meta 
dta is part of snapshot. 

Btrfs snapshot also managed by file system itself and btrfs snapshot is 
actually does a sub volume snapshot which makes a portion of logical 
entity can be made read only so that no I/Os happen to that tree.

Since snapshot is at sub volume level, other metadata such as extent tree, 
chunk tree, super block etc.. will still in active state and I/Os can go 
on them will not get flushed to disk (which is not required for btrfs sub 
volume snapshot).


Since btrfs is always consistent (With COW and check-sum), does it holds 
good if I copy blocks at disk level to another disk starting form 0 to end 
of disk ?
My guess is NO.


Or is there a way to freeze I/Os at btrfs level so that entire metadata is 
flushed to the disk ( all trees such as extent tree, chunk tree or what 
ever metadata of btrfs file system used at VFS layer) so that  disk to 
disk copy can be triggered so that I will be having all sub volumes are 
copied to an alternate disk which I should be able to use on another 
machine or if the original disk is gone for toss. 





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hugo Mills May 26, 2016, 9 a.m. UTC | #2
On Thu, May 26, 2016 at 08:47:39AM +0000, sri wrote:
> In traditional file systems such as ext3/ext4 when a snapshot (say with 
> LVM2) is taken, all I/O are frozen and entire file system including meta 
> dta is part of snapshot. 
> 
> Btrfs snapshot also managed by file system itself and btrfs snapshot is 
> actually does a sub volume snapshot which makes a portion of logical 
> entity can be made read only so that no I/Os happen to that tree.
> 
> Since snapshot is at sub volume level, other metadata such as extent tree, 
> chunk tree, super block etc.. will still in active state and I/Os can go 
> on them will not get flushed to disk (which is not required for btrfs sub 
> volume snapshot).
> 
> 
> Since btrfs is always consistent (With COW and check-sum), does it holds 
> good if I copy blocks at disk level to another disk starting form 0 to end 
> of disk ?
> My guess is NO.

   Correct.

> Or is there a way to freeze I/Os at btrfs level so that entire metadata is 
> flushed to the disk ( all trees such as extent tree, chunk tree or what 
> ever metadata of btrfs file system used at VFS layer) so that  disk to 
> disk copy can be triggered so that I will be having all sub volumes are 
> copied to an alternate disk which I should be able to use on another 
> machine or if the original disk is gone for toss. 

   Naïvely, doing that within btrfs would effectively hang the FS for
the lifetime of the suspension, so it would be a really bad idea.

   In general, this would be the use-case for LVM. However, that's a
dangerous operation with btrfs, because the snapshot ends up being
included in the original FS (because it has the same UUID), and thus
causes the kernel to get very confused, leading to probable FS
corruption.

   Hugo.
Filipe Manana May 27, 2016, 9:51 a.m. UTC | #3
On Thu, May 26, 2016 at 8:06 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
> Since the we are using atomic and wait queue for block group
> reservations and it's not controlled by lockdep, we need pay much more
> attention to any modification to write path.
>
> Or it's very easy to under flow block group reservations and cause lock
> balance.

Ok, have you observed this happening in practice?

There's nothing wrong with adding this warning, but it makes me
curious why you do it only here, specially if you haven't experienced
a bug. That is, there are other places (some recent others not so
recent) where we use an atomic for waiting and waking up.

Also the patch subject "btrfs: Add debug warning for new block group
reservations" is very confusing. You meant to say something like "Add
warning when decrementing a block group's reservations counter".

thanks

>
> Add warning on for dec_block_group_reservations() if the reservations is
> already minus.
>
> Although such warning doesn't always catch the directly caller, but should
> provides good enough clue for later debug.
>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
> v2:
>    Fix compile error
>    Change to WARN_ON_ONCE(), as when it underflows, it will always under
>    flow
> ---
>  fs/btrfs/extent-tree.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 9424864..47ce96b 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -6243,6 +6243,7 @@ void btrfs_dec_block_group_reservations(struct btrfs_fs_info *fs_info,
>         ASSERT(bg);
>         if (atomic_dec_and_test(&bg->reservations))
>                 wake_up_atomic_t(&bg->reservations);
> +       WARN_ON_ONCE(atomic_read(&bg->reservations) < 0);
>         btrfs_put_block_group(bg);
>  }
>
> --
> 2.8.3
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo May 27, 2016, 10:06 a.m. UTC | #4
Filipe Manana wrote on 2016/05/27 10:51 +0100:
> On Thu, May 26, 2016 at 8:06 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>> Since the we are using atomic and wait queue for block group
>> reservations and it's not controlled by lockdep, we need pay much more
>> attention to any modification to write path.
>>
>> Or it's very easy to under flow block group reservations and cause lock
>> balance.
>
> Ok, have you observed this happening in practice?

Yes, when rebasing dedupe patchset.

Git rebasing(to for-linus-4.7) without extra modification, will cause 
balance to be blocked with dedupe test case btrfs/203.

For dedupe, cow_file_range() may call dec_block_group_reserverations() 
while the data extent is not newly allocated, but reusing existing one.

So there dev_block_group_reservations() will under flow reservations.

And blocked backtrace leads to block group reservations.

With this patch, it takes much less time to fix dedupe patchset.
And I assume later modification to run_delalloc can easily cause such 
problem, so added this patch.
>
> There's nothing wrong with adding this warning, but it makes me
> curious why you do it only here, specially if you haven't experienced
> a bug. That is, there are other places (some recent others not so
> recent) where we use an atomic for waiting and waking up.
>
> Also the patch subject "btrfs: Add debug warning for new block group
> reservations" is very confusing. You meant to say something like "Add
> warning when decrementing a block group's reservations counter".

Right, I'll update it.

Thanks,
Qu

>
> thanks
>
>>
>> Add warning on for dec_block_group_reservations() if the reservations is
>> already minus.
>>
>> Although such warning doesn't always catch the directly caller, but should
>> provides good enough clue for later debug.
>>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> ---
>> v2:
>>    Fix compile error
>>    Change to WARN_ON_ONCE(), as when it underflows, it will always under
>>    flow
>> ---
>>  fs/btrfs/extent-tree.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 9424864..47ce96b 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -6243,6 +6243,7 @@ void btrfs_dec_block_group_reservations(struct btrfs_fs_info *fs_info,
>>         ASSERT(bg);
>>         if (atomic_dec_and_test(&bg->reservations))
>>                 wake_up_atomic_t(&bg->reservations);
>> +       WARN_ON_ONCE(atomic_read(&bg->reservations) < 0);
>>         btrfs_put_block_group(bg);
>>  }
>>
>> --
>> 2.8.3
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 9424864..47ce96b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -6243,6 +6243,7 @@  void btrfs_dec_block_group_reservations(struct btrfs_fs_info *fs_info,
 	ASSERT(bg);
 	if (atomic_dec_and_test(&bg->reservations))
 		wake_up_atomic_t(&bg->reservations);
+	WARN_ON_ONCE(atomic_read(&bg->reservations) < 0);
 	btrfs_put_block_group(bg);
 }