Message ID | 20160526070608.10837-1-quwenruo@cn.fujitsu.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
In traditional file systems such as ext3/ext4 when a snapshot (say with LVM2) is taken, all I/O are frozen and entire file system including meta dta is part of snapshot. Btrfs snapshot also managed by file system itself and btrfs snapshot is actually does a sub volume snapshot which makes a portion of logical entity can be made read only so that no I/Os happen to that tree. Since snapshot is at sub volume level, other metadata such as extent tree, chunk tree, super block etc.. will still in active state and I/Os can go on them will not get flushed to disk (which is not required for btrfs sub volume snapshot). Since btrfs is always consistent (With COW and check-sum), does it holds good if I copy blocks at disk level to another disk starting form 0 to end of disk ? My guess is NO. Or is there a way to freeze I/Os at btrfs level so that entire metadata is flushed to the disk ( all trees such as extent tree, chunk tree or what ever metadata of btrfs file system used at VFS layer) so that disk to disk copy can be triggered so that I will be having all sub volumes are copied to an alternate disk which I should be able to use on another machine or if the original disk is gone for toss. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 26, 2016 at 08:47:39AM +0000, sri wrote: > In traditional file systems such as ext3/ext4 when a snapshot (say with > LVM2) is taken, all I/O are frozen and entire file system including meta > dta is part of snapshot. > > Btrfs snapshot also managed by file system itself and btrfs snapshot is > actually does a sub volume snapshot which makes a portion of logical > entity can be made read only so that no I/Os happen to that tree. > > Since snapshot is at sub volume level, other metadata such as extent tree, > chunk tree, super block etc.. will still in active state and I/Os can go > on them will not get flushed to disk (which is not required for btrfs sub > volume snapshot). > > > Since btrfs is always consistent (With COW and check-sum), does it holds > good if I copy blocks at disk level to another disk starting form 0 to end > of disk ? > My guess is NO. Correct. > Or is there a way to freeze I/Os at btrfs level so that entire metadata is > flushed to the disk ( all trees such as extent tree, chunk tree or what > ever metadata of btrfs file system used at VFS layer) so that disk to > disk copy can be triggered so that I will be having all sub volumes are > copied to an alternate disk which I should be able to use on another > machine or if the original disk is gone for toss. Naïvely, doing that within btrfs would effectively hang the FS for the lifetime of the suspension, so it would be a really bad idea. In general, this would be the use-case for LVM. However, that's a dangerous operation with btrfs, because the snapshot ends up being included in the original FS (because it has the same UUID), and thus causes the kernel to get very confused, leading to probable FS corruption. Hugo.
On Thu, May 26, 2016 at 8:06 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: > Since the we are using atomic and wait queue for block group > reservations and it's not controlled by lockdep, we need pay much more > attention to any modification to write path. > > Or it's very easy to under flow block group reservations and cause lock > balance. Ok, have you observed this happening in practice? There's nothing wrong with adding this warning, but it makes me curious why you do it only here, specially if you haven't experienced a bug. That is, there are other places (some recent others not so recent) where we use an atomic for waiting and waking up. Also the patch subject "btrfs: Add debug warning for new block group reservations" is very confusing. You meant to say something like "Add warning when decrementing a block group's reservations counter". thanks > > Add warning on for dec_block_group_reservations() if the reservations is > already minus. > > Although such warning doesn't always catch the directly caller, but should > provides good enough clue for later debug. > > Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> > --- > v2: > Fix compile error > Change to WARN_ON_ONCE(), as when it underflows, it will always under > flow > --- > fs/btrfs/extent-tree.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 9424864..47ce96b 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -6243,6 +6243,7 @@ void btrfs_dec_block_group_reservations(struct btrfs_fs_info *fs_info, > ASSERT(bg); > if (atomic_dec_and_test(&bg->reservations)) > wake_up_atomic_t(&bg->reservations); > + WARN_ON_ONCE(atomic_read(&bg->reservations) < 0); > btrfs_put_block_group(bg); > } > > -- > 2.8.3 > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Filipe Manana wrote on 2016/05/27 10:51 +0100: > On Thu, May 26, 2016 at 8:06 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: >> Since the we are using atomic and wait queue for block group >> reservations and it's not controlled by lockdep, we need pay much more >> attention to any modification to write path. >> >> Or it's very easy to under flow block group reservations and cause lock >> balance. > > Ok, have you observed this happening in practice? Yes, when rebasing dedupe patchset. Git rebasing(to for-linus-4.7) without extra modification, will cause balance to be blocked with dedupe test case btrfs/203. For dedupe, cow_file_range() may call dec_block_group_reserverations() while the data extent is not newly allocated, but reusing existing one. So there dev_block_group_reservations() will under flow reservations. And blocked backtrace leads to block group reservations. With this patch, it takes much less time to fix dedupe patchset. And I assume later modification to run_delalloc can easily cause such problem, so added this patch. > > There's nothing wrong with adding this warning, but it makes me > curious why you do it only here, specially if you haven't experienced > a bug. That is, there are other places (some recent others not so > recent) where we use an atomic for waiting and waking up. > > Also the patch subject "btrfs: Add debug warning for new block group > reservations" is very confusing. You meant to say something like "Add > warning when decrementing a block group's reservations counter". Right, I'll update it. Thanks, Qu > > thanks > >> >> Add warning on for dec_block_group_reservations() if the reservations is >> already minus. >> >> Although such warning doesn't always catch the directly caller, but should >> provides good enough clue for later debug. >> >> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> >> --- >> v2: >> Fix compile error >> Change to WARN_ON_ONCE(), as when it underflows, it will always under >> flow >> --- >> fs/btrfs/extent-tree.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c >> index 9424864..47ce96b 100644 >> --- a/fs/btrfs/extent-tree.c >> +++ b/fs/btrfs/extent-tree.c >> @@ -6243,6 +6243,7 @@ void btrfs_dec_block_group_reservations(struct btrfs_fs_info *fs_info, >> ASSERT(bg); >> if (atomic_dec_and_test(&bg->reservations)) >> wake_up_atomic_t(&bg->reservations); >> + WARN_ON_ONCE(atomic_read(&bg->reservations) < 0); >> btrfs_put_block_group(bg); >> } >> >> -- >> 2.8.3 >> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 9424864..47ce96b 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -6243,6 +6243,7 @@ void btrfs_dec_block_group_reservations(struct btrfs_fs_info *fs_info, ASSERT(bg); if (atomic_dec_and_test(&bg->reservations)) wake_up_atomic_t(&bg->reservations); + WARN_ON_ONCE(atomic_read(&bg->reservations) < 0); btrfs_put_block_group(bg); }
Since the we are using atomic and wait queue for block group reservations and it's not controlled by lockdep, we need pay much more attention to any modification to write path. Or it's very easy to under flow block group reservations and cause lock balance. Add warning on for dec_block_group_reservations() if the reservations is already minus. Although such warning doesn't always catch the directly caller, but should provides good enough clue for later debug. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> --- v2: Fix compile error Change to WARN_ON_ONCE(), as when it underflows, it will always under flow --- fs/btrfs/extent-tree.c | 1 + 1 file changed, 1 insertion(+)