diff mbox

Btrfs: correctly determine if blocks are shared in btrfs_compare_trees

Message ID 1392930925-5348-1-git-send-email-fdmanana@gmail.com (mailing list archive)
State Accepted, archived
Headers show

Commit Message

Filipe Manana Feb. 20, 2014, 9:15 p.m. UTC
Just comparing the pointers (logical disk addresses) of the btree nodes is
not completely bullet proof, we have to check if their generation numbers
match too.

It is guaranteed that a COW operation will result in a block with a different
logical disk address than the original block's address, but over time we can
reuse that former logical disk address.

For example, creating a 2Gb filesystem on a loop device, and having a script
running in a loop always updating the access timestamp of a file, resulted in
the same logical disk address being reused for the same fs btree block in about
only 4 minutes.

This could make us skip entire subtrees when doing an incremental send (which
is currently the only user of btrfs_compare_trees). However the odds of getting
2 blocks at the same tree level, with the same logical disk address, equal first
slot keys and different generations, should hopefully be very low.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
---
 fs/btrfs/ctree.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Alex Lyakas April 5, 2014, 6:22 p.m. UTC | #1
Hi Filipe,
Can you please explain more what is the scenario you are worried about.

Let's say we have two FS trees (subvolumes) subv1 and subv2, subv2
being a RO snapshot of subv1. And they have a shared subtree at
logical==X. Now we change subv1, so its subtree is COW'ed and some
other logical address (Y) is being allocated for subtree root. But X
still cannot be reused as long as subv2 exists. That's the essence of
the extent tree providing refcount for each tree/data block in the FS,
no?

Now finally we delete subv2 and block X is freed. So it can be
reallocated as a root of another subtree. And then it might be
snapshotted again and shared as before.
So where do you see a problem?

If we have two FS-tree subtrees starting at the same logical=X, how
can they be different? This means we allocated logical=X again, while
it was still in use, which is very very bad.

Am I missing something here?

Thanks,
Alex.

P.S.: by "logical" I (and hopefully you) mean the extent-tree level
addresses, i.e., if we have a tree block with logical=X, then we also
have an EXTENT_ITEM with key (X, EXTENT_ITEM, nodesize/leafsize).


On Fri, Feb 21, 2014 at 12:15 AM, Filipe David Borba Manana
<fdmanana@gmail.com> wrote:
> Just comparing the pointers (logical disk addresses) of the btree nodes is
> not completely bullet proof, we have to check if their generation numbers
> match too.
>
> It is guaranteed that a COW operation will result in a block with a different
> logical disk address than the original block's address, but over time we can
> reuse that former logical disk address.
>
> For example, creating a 2Gb filesystem on a loop device, and having a script
> running in a loop always updating the access timestamp of a file, resulted in
> the same logical disk address being reused for the same fs btree block in about
> only 4 minutes.
>
> This could make us skip entire subtrees when doing an incremental send (which
> is currently the only user of btrfs_compare_trees). However the odds of getting
> 2 blocks at the same tree level, with the same logical disk address, equal first
> slot keys and different generations, should hopefully be very low.
>
> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
> ---
>  fs/btrfs/ctree.c |   11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index cbd3a7d..88d1b1e 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -5376,6 +5376,8 @@ int btrfs_compare_trees(struct btrfs_root *left_root,
>         int advance_right;
>         u64 left_blockptr;
>         u64 right_blockptr;
> +       u64 left_gen;
> +       u64 right_gen;
>         u64 left_start_ctransid;
>         u64 right_start_ctransid;
>         u64 ctransid;
> @@ -5640,7 +5642,14 @@ int btrfs_compare_trees(struct btrfs_root *left_root,
>                                 right_blockptr = btrfs_node_blockptr(
>                                                 right_path->nodes[right_level],
>                                                 right_path->slots[right_level]);
> -                               if (left_blockptr == right_blockptr) {
> +                               left_gen = btrfs_node_ptr_generation(
> +                                               left_path->nodes[left_level],
> +                                               left_path->slots[left_level]);
> +                               right_gen = btrfs_node_ptr_generation(
> +                                               right_path->nodes[right_level],
> +                                               right_path->slots[right_level]);
> +                               if (left_blockptr == right_blockptr &&
> +                                   left_gen == right_gen) {
>                                         /*
>                                          * As we're on a shared block, don't
>                                          * allow to go deeper.
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index cbd3a7d..88d1b1e 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -5376,6 +5376,8 @@  int btrfs_compare_trees(struct btrfs_root *left_root,
 	int advance_right;
 	u64 left_blockptr;
 	u64 right_blockptr;
+	u64 left_gen;
+	u64 right_gen;
 	u64 left_start_ctransid;
 	u64 right_start_ctransid;
 	u64 ctransid;
@@ -5640,7 +5642,14 @@  int btrfs_compare_trees(struct btrfs_root *left_root,
 				right_blockptr = btrfs_node_blockptr(
 						right_path->nodes[right_level],
 						right_path->slots[right_level]);
-				if (left_blockptr == right_blockptr) {
+				left_gen = btrfs_node_ptr_generation(
+						left_path->nodes[left_level],
+						left_path->slots[left_level]);
+				right_gen = btrfs_node_ptr_generation(
+						right_path->nodes[right_level],
+						right_path->slots[right_level]);
+				if (left_blockptr == right_blockptr &&
+				    left_gen == right_gen) {
 					/*
 					 * As we're on a shared block, don't
 					 * allow to go deeper.