Message ID | d433af292d5e99ff194bc6362133e64704ecd006.1629486429.git.josef@toxicpanda.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs-progs: mkfs fixes and enhancements for extent tree v2 | expand |
On 2021/8/21 上午3:11, Josef Bacik wrote: > Currently we build a bare-bones file system in make_btrfs(), and then we > load it up and fill in the rest of the file system after the fact. One > thing we omit in make_btrfs() is the block group item for the temporary > system chunk we allocate, because we just add it after we've opened the > file system. > > However I want to be able to generate the free space tree at > make_btrfs() time, because extent tree v2 will not have an extent tree > that has every block allocated in the system. In order to do this I > need to make sure that the free space tree entries are added on block > group creation, which is annoying if we have to add this chunk after > I've created a free space tree. > > So make future work simpler by simply adding our block group item at > make_btrfs() time, this way I can do the right things with the free > space tree in the generic make block group code without needing a > special case for our temporary system chunk. > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > --- > mkfs/common.c | 31 +++++++++++++++++++++++++++++++ > mkfs/main.c | 9 ++------- > 2 files changed, 33 insertions(+), 7 deletions(-) > > diff --git a/mkfs/common.c b/mkfs/common.c > index 9263965e..cba97687 100644 > --- a/mkfs/common.c > +++ b/mkfs/common.c > @@ -190,6 +190,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) > u64 num_bytes; > u64 system_group_offset = BTRFS_BLOCK_RESERVED_1M_FOR_SUPER; > u64 system_group_size = BTRFS_MKFS_SYSTEM_GROUP_SIZE; > + bool add_block_group = true; > > if ((cfg->features & BTRFS_FEATURE_INCOMPAT_ZONED)) { > system_group_offset = cfg->zone_size * BTRFS_NR_SB_LOG_ZONES; > @@ -283,6 +284,36 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) > if (blk == MKFS_SUPER_BLOCK) > continue; > > + /* Add the block group item for our temporary chunk. */ > + if (cfg->blocks[blk] > system_group_offset && > + add_block_group) { This makes the block group item always be the first item. But for skinny metadata, METADATA_ITEM is smaller than BLOCK_GROUP_ITEM, meaning it should be before BLOCK_GROUP_ITEM. Won't this cause the extent tree has a bad key order? BTW, if we get rid of the superblock in cfg->blocks[], we don't need to check against system_group_offset. Thanks, Qu > + struct btrfs_block_group_item *bg_item; > + > + add_block_group = false; > + > + itemoff -= sizeof(*bg_item); > + btrfs_set_disk_key_objectid(&disk_key, > + system_group_offset); > + btrfs_set_disk_key_offset(&disk_key, > + system_group_size); > + btrfs_set_disk_key_type(&disk_key, > + BTRFS_BLOCK_GROUP_ITEM_KEY); > + btrfs_set_item_key(buf, &disk_key, nritems); > + btrfs_set_item_offset(buf, btrfs_item_nr(nritems), > + itemoff); > + btrfs_set_item_size(buf, btrfs_item_nr(nritems), > + sizeof(*bg_item)); > + > + bg_item = btrfs_item_ptr(buf, nritems, > + struct btrfs_block_group_item); > + btrfs_set_block_group_used(buf, bg_item, total_used); > + btrfs_set_block_group_flags(buf, bg_item, > + BTRFS_BLOCK_GROUP_SYSTEM); > + btrfs_set_block_group_chunk_objectid(buf, bg_item, > + BTRFS_FIRST_CHUNK_TREE_OBJECTID); > + nritems++; > + } > + > item_size = sizeof(struct btrfs_extent_item); > if (!skinny_metadata) > item_size += sizeof(struct btrfs_tree_block_info); > diff --git a/mkfs/main.c b/mkfs/main.c > index eab93eb3..ea53e9c7 100644 > --- a/mkfs/main.c > +++ b/mkfs/main.c > @@ -67,7 +67,6 @@ static int create_metadata_block_groups(struct btrfs_root *root, int mixed, > struct btrfs_trans_handle *trans; > struct btrfs_space_info *sinfo; > u64 flags = BTRFS_BLOCK_GROUP_METADATA; > - u64 bytes_used; > u64 chunk_start = 0; > u64 chunk_size = 0; > u64 system_group_offset = BTRFS_BLOCK_RESERVED_1M_FOR_SUPER; > @@ -90,16 +89,12 @@ static int create_metadata_block_groups(struct btrfs_root *root, int mixed, > > trans = btrfs_start_transaction(root, 1); > BUG_ON(IS_ERR(trans)); > - bytes_used = btrfs_super_bytes_used(fs_info->super_copy); > > root->fs_info->system_allocs = 1; > /* > - * First temporary system chunk must match the chunk layout > - * created in make_btrfs(). > + * We already created the block group item for our temporary system > + * chunk in make_btrfs(), so account for the size here. > */ > - ret = btrfs_make_block_group(trans, fs_info, bytes_used, > - BTRFS_BLOCK_GROUP_SYSTEM, > - system_group_offset, system_group_size); > allocation->system += system_group_size; > if (ret) > return ret; >
On 8/23/21 5:00 AM, Qu Wenruo wrote: > > > On 2021/8/21 上午3:11, Josef Bacik wrote: >> Currently we build a bare-bones file system in make_btrfs(), and then we >> load it up and fill in the rest of the file system after the fact. One >> thing we omit in make_btrfs() is the block group item for the temporary >> system chunk we allocate, because we just add it after we've opened the >> file system. >> >> However I want to be able to generate the free space tree at >> make_btrfs() time, because extent tree v2 will not have an extent tree >> that has every block allocated in the system. In order to do this I >> need to make sure that the free space tree entries are added on block >> group creation, which is annoying if we have to add this chunk after >> I've created a free space tree. >> >> So make future work simpler by simply adding our block group item at >> make_btrfs() time, this way I can do the right things with the free >> space tree in the generic make block group code without needing a >> special case for our temporary system chunk. >> >> Signed-off-by: Josef Bacik <josef@toxicpanda.com> >> --- >> mkfs/common.c | 31 +++++++++++++++++++++++++++++++ >> mkfs/main.c | 9 ++------- >> 2 files changed, 33 insertions(+), 7 deletions(-) >> >> diff --git a/mkfs/common.c b/mkfs/common.c >> index 9263965e..cba97687 100644 >> --- a/mkfs/common.c >> +++ b/mkfs/common.c >> @@ -190,6 +190,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) >> u64 num_bytes; >> u64 system_group_offset = BTRFS_BLOCK_RESERVED_1M_FOR_SUPER; >> u64 system_group_size = BTRFS_MKFS_SYSTEM_GROUP_SIZE; >> + bool add_block_group = true; >> >> if ((cfg->features & BTRFS_FEATURE_INCOMPAT_ZONED)) { >> system_group_offset = cfg->zone_size * BTRFS_NR_SB_LOG_ZONES; >> @@ -283,6 +284,36 @@ int make_btrfs(int fd, struct btrfs_mkfs_config >> *cfg) >> if (blk == MKFS_SUPER_BLOCK) >> continue; >> >> + /* Add the block group item for our temporary chunk. */ >> + if (cfg->blocks[blk] > system_group_offset && >> + add_block_group) { > > This makes the block group item always be the first item. > > But for skinny metadata, METADATA_ITEM is smaller than BLOCK_GROUP_ITEM, > meaning it should be before BLOCK_GROUP_ITEM. > > Won't this cause the extent tree has a bad key order? > No because it's based on the actual bytenr. We'll insert the extent item entry first, and then move to the next block and if it's past the first block we add the block group item, and then the actual extent item. So it goes first block X gets extent item inserted X+1 > X, insert block group item insert X+1 extent item. Thanks, Josef
On 2021/8/24 上午4:04, Josef Bacik wrote: > On 8/23/21 5:00 AM, Qu Wenruo wrote: >> >> >> On 2021/8/21 上午3:11, Josef Bacik wrote: >>> Currently we build a bare-bones file system in make_btrfs(), and then we >>> load it up and fill in the rest of the file system after the fact. One >>> thing we omit in make_btrfs() is the block group item for the temporary >>> system chunk we allocate, because we just add it after we've opened the >>> file system. >>> >>> However I want to be able to generate the free space tree at >>> make_btrfs() time, because extent tree v2 will not have an extent tree >>> that has every block allocated in the system. In order to do this I >>> need to make sure that the free space tree entries are added on block >>> group creation, which is annoying if we have to add this chunk after >>> I've created a free space tree. >>> >>> So make future work simpler by simply adding our block group item at >>> make_btrfs() time, this way I can do the right things with the free >>> space tree in the generic make block group code without needing a >>> special case for our temporary system chunk. >>> >>> Signed-off-by: Josef Bacik <josef@toxicpanda.com> >>> --- >>> mkfs/common.c | 31 +++++++++++++++++++++++++++++++ >>> mkfs/main.c | 9 ++------- >>> 2 files changed, 33 insertions(+), 7 deletions(-) >>> >>> diff --git a/mkfs/common.c b/mkfs/common.c >>> index 9263965e..cba97687 100644 >>> --- a/mkfs/common.c >>> +++ b/mkfs/common.c >>> @@ -190,6 +190,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config >>> *cfg) >>> u64 num_bytes; >>> u64 system_group_offset = BTRFS_BLOCK_RESERVED_1M_FOR_SUPER; >>> u64 system_group_size = BTRFS_MKFS_SYSTEM_GROUP_SIZE; >>> + bool add_block_group = true; >>> >>> if ((cfg->features & BTRFS_FEATURE_INCOMPAT_ZONED)) { >>> system_group_offset = cfg->zone_size * BTRFS_NR_SB_LOG_ZONES; >>> @@ -283,6 +284,36 @@ int make_btrfs(int fd, struct btrfs_mkfs_config >>> *cfg) >>> if (blk == MKFS_SUPER_BLOCK) >>> continue; >>> >>> + /* Add the block group item for our temporary chunk. */ >>> + if (cfg->blocks[blk] > system_group_offset && >>> + add_block_group) { >> >> This makes the block group item always be the first item. >> >> But for skinny metadata, METADATA_ITEM is smaller than BLOCK_GROUP_ITEM, >> meaning it should be before BLOCK_GROUP_ITEM. >> >> Won't this cause the extent tree has a bad key order? >> > > No because it's based on the actual bytenr. We'll insert the extent > item entry first, and then move to the next block and if it's past the > first block we add the block group item, and then the actual extent > item. So it goes > > first block X gets extent item inserted > X+1 > X, insert block group item > insert X+1 extent item. > But then what if we go non-skinny metadata? Then block group item is always before any extent item. Thanks, Qu > Thanks, > > Josef
On 8/23/21 7:37 PM, Qu Wenruo wrote: > > > On 2021/8/24 上午4:04, Josef Bacik wrote: >> On 8/23/21 5:00 AM, Qu Wenruo wrote: >>> >>> >>> On 2021/8/21 上午3:11, Josef Bacik wrote: >>>> Currently we build a bare-bones file system in make_btrfs(), and >>>> then we >>>> load it up and fill in the rest of the file system after the fact. One >>>> thing we omit in make_btrfs() is the block group item for the temporary >>>> system chunk we allocate, because we just add it after we've opened the >>>> file system. >>>> >>>> However I want to be able to generate the free space tree at >>>> make_btrfs() time, because extent tree v2 will not have an extent tree >>>> that has every block allocated in the system. In order to do this I >>>> need to make sure that the free space tree entries are added on block >>>> group creation, which is annoying if we have to add this chunk after >>>> I've created a free space tree. >>>> >>>> So make future work simpler by simply adding our block group item at >>>> make_btrfs() time, this way I can do the right things with the free >>>> space tree in the generic make block group code without needing a >>>> special case for our temporary system chunk. >>>> >>>> Signed-off-by: Josef Bacik <josef@toxicpanda.com> >>>> --- >>>> mkfs/common.c | 31 +++++++++++++++++++++++++++++++ >>>> mkfs/main.c | 9 ++------- >>>> 2 files changed, 33 insertions(+), 7 deletions(-) >>>> >>>> diff --git a/mkfs/common.c b/mkfs/common.c >>>> index 9263965e..cba97687 100644 >>>> --- a/mkfs/common.c >>>> +++ b/mkfs/common.c >>>> @@ -190,6 +190,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config >>>> *cfg) >>>> u64 num_bytes; >>>> u64 system_group_offset = BTRFS_BLOCK_RESERVED_1M_FOR_SUPER; >>>> u64 system_group_size = BTRFS_MKFS_SYSTEM_GROUP_SIZE; >>>> + bool add_block_group = true; >>>> >>>> if ((cfg->features & BTRFS_FEATURE_INCOMPAT_ZONED)) { >>>> system_group_offset = cfg->zone_size * BTRFS_NR_SB_LOG_ZONES; >>>> @@ -283,6 +284,36 @@ int make_btrfs(int fd, struct btrfs_mkfs_config >>>> *cfg) >>>> if (blk == MKFS_SUPER_BLOCK) >>>> continue; >>>> >>>> + /* Add the block group item for our temporary chunk. */ >>>> + if (cfg->blocks[blk] > system_group_offset && >>>> + add_block_group) { >>> >>> This makes the block group item always be the first item. >>> >>> But for skinny metadata, METADATA_ITEM is smaller than BLOCK_GROUP_ITEM, >>> meaning it should be before BLOCK_GROUP_ITEM. >>> >>> Won't this cause the extent tree has a bad key order? >>> >> >> No because it's based on the actual bytenr. We'll insert the extent >> item entry first, and then move to the next block and if it's past the >> first block we add the block group item, and then the actual extent >> item. So it goes >> >> first block X gets extent item inserted >> X+1 > X, insert block group item >> insert X+1 extent item. >> > > But then what if we go non-skinny metadata? > > Then block group item is always before any extent item. > item 0 key (1048576 METADATA_ITEM 0) itemoff 16259 itemsize 24 refs 1 gen 1 flags TREE_BLOCK tree block skinny level 0 item 1 key (1048576 TREE_BLOCK_REF 1) itemoff 16259 itemsize 0 tree block backref item 2 key (1048576 BLOCK_GROUP_ITEM 4194304) itemoff 16235 itemsize 24 block group used 98304 chunk_objectid 256 flags SYSTEM item 3 key (1064960 METADATA_ITEM 0) itemoff 16211 itemsize 24 refs 1 gen 1 flags TREE_BLOCK tree block skinny level 0 This is what it looks like. We're basing it off of the key.objectid. If the key.objectid's match, which they will, the block group will always be after it all. It's doing the right thing. Thanks, Josef
On 2021/8/24 上午7:47, Josef Bacik wrote: > On 8/23/21 7:37 PM, Qu Wenruo wrote: >> >> >> On 2021/8/24 上午4:04, Josef Bacik wrote: >>> On 8/23/21 5:00 AM, Qu Wenruo wrote: >>>> >>>> >>>> On 2021/8/21 上午3:11, Josef Bacik wrote: >>>>> Currently we build a bare-bones file system in make_btrfs(), and >>>>> then we >>>>> load it up and fill in the rest of the file system after the fact. >>>>> One >>>>> thing we omit in make_btrfs() is the block group item for the >>>>> temporary >>>>> system chunk we allocate, because we just add it after we've opened >>>>> the >>>>> file system. >>>>> >>>>> However I want to be able to generate the free space tree at >>>>> make_btrfs() time, because extent tree v2 will not have an extent tree >>>>> that has every block allocated in the system. In order to do this I >>>>> need to make sure that the free space tree entries are added on block >>>>> group creation, which is annoying if we have to add this chunk after >>>>> I've created a free space tree. >>>>> >>>>> So make future work simpler by simply adding our block group item at >>>>> make_btrfs() time, this way I can do the right things with the free >>>>> space tree in the generic make block group code without needing a >>>>> special case for our temporary system chunk. >>>>> >>>>> Signed-off-by: Josef Bacik <josef@toxicpanda.com> >>>>> --- >>>>> mkfs/common.c | 31 +++++++++++++++++++++++++++++++ >>>>> mkfs/main.c | 9 ++------- >>>>> 2 files changed, 33 insertions(+), 7 deletions(-) >>>>> >>>>> diff --git a/mkfs/common.c b/mkfs/common.c >>>>> index 9263965e..cba97687 100644 >>>>> --- a/mkfs/common.c >>>>> +++ b/mkfs/common.c >>>>> @@ -190,6 +190,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config >>>>> *cfg) >>>>> u64 num_bytes; >>>>> u64 system_group_offset = BTRFS_BLOCK_RESERVED_1M_FOR_SUPER; >>>>> u64 system_group_size = BTRFS_MKFS_SYSTEM_GROUP_SIZE; >>>>> + bool add_block_group = true; >>>>> >>>>> if ((cfg->features & BTRFS_FEATURE_INCOMPAT_ZONED)) { >>>>> system_group_offset = cfg->zone_size * >>>>> BTRFS_NR_SB_LOG_ZONES; >>>>> @@ -283,6 +284,36 @@ int make_btrfs(int fd, struct btrfs_mkfs_config >>>>> *cfg) >>>>> if (blk == MKFS_SUPER_BLOCK) >>>>> continue; >>>>> >>>>> + /* Add the block group item for our temporary chunk. */ >>>>> + if (cfg->blocks[blk] > system_group_offset && >>>>> + add_block_group) { >>>> >>>> This makes the block group item always be the first item. >>>> >>>> But for skinny metadata, METADATA_ITEM is smaller than >>>> BLOCK_GROUP_ITEM, >>>> meaning it should be before BLOCK_GROUP_ITEM. >>>> >>>> Won't this cause the extent tree has a bad key order? >>>> >>> >>> No because it's based on the actual bytenr. We'll insert the extent >>> item entry first, and then move to the next block and if it's past the >>> first block we add the block group item, and then the actual extent >>> item. So it goes >>> >>> first block X gets extent item inserted >>> X+1 > X, insert block group item >>> insert X+1 extent item. >>> >> >> But then what if we go non-skinny metadata? >> >> Then block group item is always before any extent item. >> > > item 0 key (1048576 METADATA_ITEM 0) itemoff 16259 itemsize 24 > refs 1 gen 1 flags TREE_BLOCK > tree block skinny level 0 > item 1 key (1048576 TREE_BLOCK_REF 1) itemoff 16259 itemsize 0 > tree block backref > item 2 key (1048576 BLOCK_GROUP_ITEM 4194304) itemoff 16235 > itemsize 24 > block group used 98304 chunk_objectid 256 flags SYSTEM > item 3 key (1064960 METADATA_ITEM 0) itemoff 16211 itemsize 24 > refs 1 gen 1 flags TREE_BLOCK > tree block skinny level 0 > > > This is what it looks like. We're basing it off of the key.objectid. If > the key.objectid's match, which they will, the block group will always > be after it all. It's doing the right thing. Thanks, Oh my bad memory. Both EXTENT_ITEM and METADATA_ITEM are smaller than BLOCK_GROUP_ITEM. So these extent items should always be before BLOCK_GROUP_ITEM, no matter if the skinny metadata is spcififed. Then the code is completely fine. Thanks, Qu > > Josef
diff --git a/mkfs/common.c b/mkfs/common.c index 9263965e..cba97687 100644 --- a/mkfs/common.c +++ b/mkfs/common.c @@ -190,6 +190,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) u64 num_bytes; u64 system_group_offset = BTRFS_BLOCK_RESERVED_1M_FOR_SUPER; u64 system_group_size = BTRFS_MKFS_SYSTEM_GROUP_SIZE; + bool add_block_group = true; if ((cfg->features & BTRFS_FEATURE_INCOMPAT_ZONED)) { system_group_offset = cfg->zone_size * BTRFS_NR_SB_LOG_ZONES; @@ -283,6 +284,36 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg) if (blk == MKFS_SUPER_BLOCK) continue; + /* Add the block group item for our temporary chunk. */ + if (cfg->blocks[blk] > system_group_offset && + add_block_group) { + struct btrfs_block_group_item *bg_item; + + add_block_group = false; + + itemoff -= sizeof(*bg_item); + btrfs_set_disk_key_objectid(&disk_key, + system_group_offset); + btrfs_set_disk_key_offset(&disk_key, + system_group_size); + btrfs_set_disk_key_type(&disk_key, + BTRFS_BLOCK_GROUP_ITEM_KEY); + btrfs_set_item_key(buf, &disk_key, nritems); + btrfs_set_item_offset(buf, btrfs_item_nr(nritems), + itemoff); + btrfs_set_item_size(buf, btrfs_item_nr(nritems), + sizeof(*bg_item)); + + bg_item = btrfs_item_ptr(buf, nritems, + struct btrfs_block_group_item); + btrfs_set_block_group_used(buf, bg_item, total_used); + btrfs_set_block_group_flags(buf, bg_item, + BTRFS_BLOCK_GROUP_SYSTEM); + btrfs_set_block_group_chunk_objectid(buf, bg_item, + BTRFS_FIRST_CHUNK_TREE_OBJECTID); + nritems++; + } + item_size = sizeof(struct btrfs_extent_item); if (!skinny_metadata) item_size += sizeof(struct btrfs_tree_block_info); diff --git a/mkfs/main.c b/mkfs/main.c index eab93eb3..ea53e9c7 100644 --- a/mkfs/main.c +++ b/mkfs/main.c @@ -67,7 +67,6 @@ static int create_metadata_block_groups(struct btrfs_root *root, int mixed, struct btrfs_trans_handle *trans; struct btrfs_space_info *sinfo; u64 flags = BTRFS_BLOCK_GROUP_METADATA; - u64 bytes_used; u64 chunk_start = 0; u64 chunk_size = 0; u64 system_group_offset = BTRFS_BLOCK_RESERVED_1M_FOR_SUPER; @@ -90,16 +89,12 @@ static int create_metadata_block_groups(struct btrfs_root *root, int mixed, trans = btrfs_start_transaction(root, 1); BUG_ON(IS_ERR(trans)); - bytes_used = btrfs_super_bytes_used(fs_info->super_copy); root->fs_info->system_allocs = 1; /* - * First temporary system chunk must match the chunk layout - * created in make_btrfs(). + * We already created the block group item for our temporary system + * chunk in make_btrfs(), so account for the size here. */ - ret = btrfs_make_block_group(trans, fs_info, bytes_used, - BTRFS_BLOCK_GROUP_SYSTEM, - system_group_offset, system_group_size); allocation->system += system_group_size; if (ret) return ret;
Currently we build a bare-bones file system in make_btrfs(), and then we load it up and fill in the rest of the file system after the fact. One thing we omit in make_btrfs() is the block group item for the temporary system chunk we allocate, because we just add it after we've opened the file system. However I want to be able to generate the free space tree at make_btrfs() time, because extent tree v2 will not have an extent tree that has every block allocated in the system. In order to do this I need to make sure that the free space tree entries are added on block group creation, which is annoying if we have to add this chunk after I've created a free space tree. So make future work simpler by simply adding our block group item at make_btrfs() time, this way I can do the right things with the free space tree in the generic make block group code without needing a special case for our temporary system chunk. Signed-off-by: Josef Bacik <josef@toxicpanda.com> --- mkfs/common.c | 31 +++++++++++++++++++++++++++++++ mkfs/main.c | 9 ++------- 2 files changed, 33 insertions(+), 7 deletions(-)