diff mbox series

btrfs: Use transid for DIR_ITEM/DIR_INDEX's location

Message ID 20200828132010.27886-1-nborisov@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: Use transid for DIR_ITEM/DIR_INDEX's location | expand

Commit Message

Nikolay Borisov Aug. 28, 2020, 1:20 p.m. UTC
When a snapshot is created its root item is inserted in the root tree
with the 'offset' field set to the transaction id when the snapshot
was created. Immediately afterwards the offset is set to -1 so when
the same key is used to create DIR_ITEM/DIR_INDEX in the destination
file tree its offset is really set to -1. Root tree item:

    item 13 key (258 ROOT_ITEM 7) itemoff 12744 itemsize 439
        generation 7 root_dirid 256 bytenr 30703616 level 0 refs 1
        lastsnap 7 byte_limit 0 bytes_used 16384 flags 0x0(none)
        uuid f13abf0d-b1f5-f34b-a179-fd1c2f89e762
        parent_uuid 51a74677-a077-4c21-bd87-2141a147ff85
        ctransid 7 otransid 7 stransid 0 rtransid 0
        ctime 1598441149.466822752 (2020-08-26 11:25:49)
        otime 1598441149.467474846 (2020-08-26 11:25:49)
        drop key (0 UNKNOWN.0 0) level 0

DIR_INDEX item for the same rooti in the destination fs tree:

item 5 key (256 DIR_INDEX 9) itemoff 15967 itemsize 39
        location key (258 ROOT_ITEM 18446744073709551615) type DIR
        transid 7 data_len 0 name_len 9
        name: snapshot1

The location key is generally used to read the root. This is not a
problem per-se since the function dealing with root searching
(btrfs_find_root) is well equipped to deal with offset being -1, namely:

    If ->offset of 'search_key' is -1ULL, it means we are not sure the
    offset of the search key, just lookup the root with the highest
    offset for a given objectid.

However this is a needless inconcistency in the way internal data
structures are being created. This patch modifies the behavior so that
DIR_INDEX/DIR_ITEM will have the offset field of the location key set
to the transid. While this results in a change of the on-disk metadata,
it doesn't constitute a functional change since older kernels can cope
with both '-1' and transid as values of the offset field so no INCOMPAT
flags are needed.

Finally while at it also move the initialization of the key in
create_pending_snapshot closer to where it's being used for the first
time.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/transaction.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

Comments

Josef Bacik Aug. 31, 2020, 3:39 p.m. UTC | #1
On 8/28/20 9:20 AM, Nikolay Borisov wrote:
> When a snapshot is created its root item is inserted in the root tree
> with the 'offset' field set to the transaction id when the snapshot
> was created. Immediately afterwards the offset is set to -1 so when
> the same key is used to create DIR_ITEM/DIR_INDEX in the destination
> file tree its offset is really set to -1. Root tree item:
> 
>      item 13 key (258 ROOT_ITEM 7) itemoff 12744 itemsize 439
>          generation 7 root_dirid 256 bytenr 30703616 level 0 refs 1
>          lastsnap 7 byte_limit 0 bytes_used 16384 flags 0x0(none)
>          uuid f13abf0d-b1f5-f34b-a179-fd1c2f89e762
>          parent_uuid 51a74677-a077-4c21-bd87-2141a147ff85
>          ctransid 7 otransid 7 stransid 0 rtransid 0
>          ctime 1598441149.466822752 (2020-08-26 11:25:49)
>          otime 1598441149.467474846 (2020-08-26 11:25:49)
>          drop key (0 UNKNOWN.0 0) level 0
> 
> DIR_INDEX item for the same rooti in the destination fs tree:
> 
> item 5 key (256 DIR_INDEX 9) itemoff 15967 itemsize 39
>          location key (258 ROOT_ITEM 18446744073709551615) type DIR
>          transid 7 data_len 0 name_len 9
>          name: snapshot1
> 
> The location key is generally used to read the root. This is not a
> problem per-se since the function dealing with root searching
> (btrfs_find_root) is well equipped to deal with offset being -1, namely:
> 
>      If ->offset of 'search_key' is -1ULL, it means we are not sure the
>      offset of the search key, just lookup the root with the highest
>      offset for a given objectid.
> 
> However this is a needless inconcistency in the way internal data
> structures are being created. This patch modifies the behavior so that
> DIR_INDEX/DIR_ITEM will have the offset field of the location key set
> to the transid. While this results in a change of the on-disk metadata,
> it doesn't constitute a functional change since older kernels can cope
> with both '-1' and transid as values of the offset field so no INCOMPAT
> flags are needed.
> 
> Finally while at it also move the initialization of the key in
> create_pending_snapshot closer to where it's being used for the first
> time.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef
Nikolay Borisov Nov. 2, 2020, 2:17 p.m. UTC | #2
On 28.08.20 г. 16:20 ч., Nikolay Borisov wrote:
> When a snapshot is created its root item is inserted in the root tree
> with the 'offset' field set to the transaction id when the snapshot
> was created. Immediately afterwards the offset is set to -1 so when
> the same key is used to create DIR_ITEM/DIR_INDEX in the destination
> file tree its offset is really set to -1. Root tree item:
> 
>     item 13 key (258 ROOT_ITEM 7) itemoff 12744 itemsize 439
>         generation 7 root_dirid 256 bytenr 30703616 level 0 refs 1
>         lastsnap 7 byte_limit 0 bytes_used 16384 flags 0x0(none)
>         uuid f13abf0d-b1f5-f34b-a179-fd1c2f89e762
>         parent_uuid 51a74677-a077-4c21-bd87-2141a147ff85
>         ctransid 7 otransid 7 stransid 0 rtransid 0
>         ctime 1598441149.466822752 (2020-08-26 11:25:49)
>         otime 1598441149.467474846 (2020-08-26 11:25:49)
>         drop key (0 UNKNOWN.0 0) level 0
> 
> DIR_INDEX item for the same rooti in the destination fs tree:
> 
> item 5 key (256 DIR_INDEX 9) itemoff 15967 itemsize 39
>         location key (258 ROOT_ITEM 18446744073709551615) type DIR
>         transid 7 data_len 0 name_len 9
>         name: snapshot1
> 
> The location key is generally used to read the root. This is not a
> problem per-se since the function dealing with root searching
> (btrfs_find_root) is well equipped to deal with offset being -1, namely:
> 
>     If ->offset of 'search_key' is -1ULL, it means we are not sure the
>     offset of the search key, just lookup the root with the highest
>     offset for a given objectid.
> 
> However this is a needless inconcistency in the way internal data
> structures are being created. This patch modifies the behavior so that
> DIR_INDEX/DIR_ITEM will have the offset field of the location key set
> to the transid. While this results in a change of the on-disk metadata,
> it doesn't constitute a functional change since older kernels can cope
> with both '-1' and transid as values of the offset field so no INCOMPAT
> flags are needed.
> 
> Finally while at it also move the initialization of the key in
> create_pending_snapshot closer to where it's being used for the first
> time.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>


Ping on that, it's been reviewed by Josef but seems to have been missed.
Nikolay Borisov Dec. 18, 2020, 8:58 a.m. UTC | #3
On 28.08.20 г. 16:20 ч., Nikolay Borisov wrote:
> When a snapshot is created its root item is inserted in the root tree
> with the 'offset' field set to the transaction id when the snapshot
> was created. Immediately afterwards the offset is set to -1 so when
> the same key is used to create DIR_ITEM/DIR_INDEX in the destination
> file tree its offset is really set to -1. Root tree item:
> 
>     item 13 key (258 ROOT_ITEM 7) itemoff 12744 itemsize 439
>         generation 7 root_dirid 256 bytenr 30703616 level 0 refs 1
>         lastsnap 7 byte_limit 0 bytes_used 16384 flags 0x0(none)
>         uuid f13abf0d-b1f5-f34b-a179-fd1c2f89e762
>         parent_uuid 51a74677-a077-4c21-bd87-2141a147ff85
>         ctransid 7 otransid 7 stransid 0 rtransid 0
>         ctime 1598441149.466822752 (2020-08-26 11:25:49)
>         otime 1598441149.467474846 (2020-08-26 11:25:49)
>         drop key (0 UNKNOWN.0 0) level 0
> 
> DIR_INDEX item for the same rooti in the destination fs tree:
> 
> item 5 key (256 DIR_INDEX 9) itemoff 15967 itemsize 39
>         location key (258 ROOT_ITEM 18446744073709551615) type DIR
>         transid 7 data_len 0 name_len 9
>         name: snapshot1
> 
> The location key is generally used to read the root. This is not a
> problem per-se since the function dealing with root searching
> (btrfs_find_root) is well equipped to deal with offset being -1, namely:
> 
>     If ->offset of 'search_key' is -1ULL, it means we are not sure the
>     offset of the search key, just lookup the root with the highest
>     offset for a given objectid.
> 
> However this is a needless inconcistency in the way internal data
> structures are being created. This patch modifies the behavior so that
> DIR_INDEX/DIR_ITEM will have the offset field of the location key set
> to the transid. While this results in a change of the on-disk metadata,
> it doesn't constitute a functional change since older kernels can cope
> with both '-1' and transid as values of the offset field so no INCOMPAT
> flags are needed.
> 
> Finally while at it also move the initialization of the key in
> create_pending_snapshot closer to where it's being used for the first
> time.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

Yet another ping
David Sterba Feb. 25, 2021, 3:52 p.m. UTC | #4
On Fri, Aug 28, 2020 at 04:20:10PM +0300, Nikolay Borisov wrote:
> When a snapshot is created its root item is inserted in the root tree
> with the 'offset' field set to the transaction id when the snapshot
> was created. Immediately afterwards the offset is set to -1 so when
> the same key is used to create DIR_ITEM/DIR_INDEX in the destination
> file tree its offset is really set to -1. Root tree item:
> 
>     item 13 key (258 ROOT_ITEM 7) itemoff 12744 itemsize 439
>         generation 7 root_dirid 256 bytenr 30703616 level 0 refs 1
>         lastsnap 7 byte_limit 0 bytes_used 16384 flags 0x0(none)
>         uuid f13abf0d-b1f5-f34b-a179-fd1c2f89e762
>         parent_uuid 51a74677-a077-4c21-bd87-2141a147ff85
>         ctransid 7 otransid 7 stransid 0 rtransid 0
>         ctime 1598441149.466822752 (2020-08-26 11:25:49)
>         otime 1598441149.467474846 (2020-08-26 11:25:49)
>         drop key (0 UNKNOWN.0 0) level 0
> 
> DIR_INDEX item for the same rooti in the destination fs tree:
> 
> item 5 key (256 DIR_INDEX 9) itemoff 15967 itemsize 39
>         location key (258 ROOT_ITEM 18446744073709551615) type DIR
>         transid 7 data_len 0 name_len 9
>         name: snapshot1
> 
> The location key is generally used to read the root. This is not a
> problem per-se since the function dealing with root searching
> (btrfs_find_root) is well equipped to deal with offset being -1, namely:
> 
>     If ->offset of 'search_key' is -1ULL, it means we are not sure the
>     offset of the search key, just lookup the root with the highest
>     offset for a given objectid.
> 
> However this is a needless inconcistency in the way internal data
> structures are being created. This patch modifies the behavior so that
> DIR_INDEX/DIR_ITEM will have the offset field of the location key set
> to the transid. While this results in a change of the on-disk metadata,
> it doesn't constitute a functional change since older kernels can cope
> with both '-1' and transid as values of the offset field so no INCOMPAT
> flags are needed.
> 
> Finally while at it also move the initialization of the key in
> create_pending_snapshot closer to where it's being used for the first
> time.

For the record I'll reply here too.

This is technically correct and probably wouldn't cause any noticeable
problem. The main concern here is that it's changing the item data from
a format that's been out there forever. Once this patch would land, the
dir items will have a different location key, making it inconsistent.

That the root item key itself has the transaction id is part of the
format, that dir item location keys have the -1 in the offset is as
well.  That the two are different and not the same is maybe unfortunate
but certainly not harmful.

Changing the format even a little always brings a risk and we never know
in advance. We only learn from mistakes here and did a few in the past.
My gut feeling is to keep it as-is, or do it properly with incompat
bits if needed. In this case it's not necessary so we'll keep things
unchanged.

> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  fs/btrfs/transaction.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 7243532cd34b..83f4e7056488 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -1509,9 +1509,6 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>  			goto clear_skip_qgroup;
>  	}
>  
> -	key.objectid = objectid;
> -	key.offset = (u64)-1;
> -	key.type = BTRFS_ROOT_ITEM_KEY;
>  
>  	rsv = trans->block_rsv;
>  	trans->block_rsv = &pending->block_rsv;
> @@ -1613,7 +1610,9 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>  
>  	btrfs_set_root_node(new_root_item, tmp);
>  	/* record when the snapshot was created in key.offset */
> +	key.objectid = objectid;
>  	key.offset = trans->transid;
> +	key.type = BTRFS_ROOT_ITEM_KEY;

This can be sent as a cleanup, it's better to keep the key intialization
at one place.

>  	ret = btrfs_insert_root(trans, tree_root, &key, new_root_item);
>  	btrfs_tree_unlock(tmp);
>  	free_extent_buffer(tmp);
> @@ -1634,7 +1633,6 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>  		goto fail;
>  	}
>  
> -	key.offset = (u64)-1;

And a comment explaining the difference between the root item and dir
item keys.

>  	pending->snap = btrfs_get_new_fs_root(fs_info, objectid, pending->anon_dev);
>  	if (IS_ERR(pending->snap)) {
>  		ret = PTR_ERR(pending->snap);
> -- 
> 2.17.1
diff mbox series

Patch

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 7243532cd34b..83f4e7056488 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1509,9 +1509,6 @@  static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 			goto clear_skip_qgroup;
 	}
 
-	key.objectid = objectid;
-	key.offset = (u64)-1;
-	key.type = BTRFS_ROOT_ITEM_KEY;
 
 	rsv = trans->block_rsv;
 	trans->block_rsv = &pending->block_rsv;
@@ -1613,7 +1610,9 @@  static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 
 	btrfs_set_root_node(new_root_item, tmp);
 	/* record when the snapshot was created in key.offset */
+	key.objectid = objectid;
 	key.offset = trans->transid;
+	key.type = BTRFS_ROOT_ITEM_KEY;
 	ret = btrfs_insert_root(trans, tree_root, &key, new_root_item);
 	btrfs_tree_unlock(tmp);
 	free_extent_buffer(tmp);
@@ -1634,7 +1633,6 @@  static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 		goto fail;
 	}
 
-	key.offset = (u64)-1;
 	pending->snap = btrfs_get_new_fs_root(fs_info, objectid, pending->anon_dev);
 	if (IS_ERR(pending->snap)) {
 		ret = PTR_ERR(pending->snap);