Message ID | 20200505000230.4454-1-wqu@suse.com (mailing list archive) |
---|---|
Headers | show |
Series | btrfs-progs: Support for SKINNY_BG_TREE feature | expand |
On Tue, May 05, 2020 at 08:02:19AM +0800, Qu Wenruo wrote: > This patchset can be fetched from github: > https://github.com/adam900710/btrfs-progs/tree/skinny_bg_tree > Which is based on v5.6 tag, with extra cleanups (sent to mail list) applied. > > This patchset provides the needed user space infrastructure for SKINNY_BG_TREE > feature. > > Since it's an new incompatible feature, unlike SKINNY_METADATA, btrfs-progs > is needed to convert existing fs (unmounted) to new format, and > vice-verse. > > Now btrfstune can convert regular extent tree fs to bg tree fs to > improve mount time. > > For the performance improvement, please check the kernel patchset cover > letter or the last patch. > (SPOILER ALERT: It's super fast) > > Changelog: > v2: > - Rebase to v5.2.2 tag > - Add btrfstune ability to convert existing fs to BG_TREE feature > > v3: > - Fix a bug that temp chunks are not cleaned up properly > This is caused by wrong timing btrfs_convert_to_bg_tree() is called. > It should be called after temp chunks cleaned up. > > - Fix a bug that an extent buffer get leaked > This is caused by newly created bg tree not added to dirty list. > > v4: > - Go with skinny bg tree other than regular block group item > We're introducing a new incompatible feature anyway, why not go > extreme? > > - Use the same refactor as kernel. > To make code much cleaner and easier to read. > > - Add the ability to rollback to regular extent tree. > So confident tester can try SKINNY_BG_TREE using their real world > data, and rollback if they still want to mount it using older kernels. > > Qu Wenruo (11): > btrfs-progs: check/lowmem: Lookup block group item in a seperate > function > btrfs-progs: block-group: Refactor how we read one block group item > btrfs-progs: Rename btrfs_remove_block_group() and > free_block_group_item() > btrfs-progs: block-group: Refactor how we insert a block group item > btrfs-progs: block-group: Rename write_one_cahce_group() I'll add the above patches independently, for the rest I don't know. I still think the separate tree is somehow wrong so have to convince myself that it's not.
On 2020/5/12 上午2:58, David Sterba wrote: > On Tue, May 05, 2020 at 08:02:19AM +0800, Qu Wenruo wrote: >> This patchset can be fetched from github: >> https://github.com/adam900710/btrfs-progs/tree/skinny_bg_tree >> Which is based on v5.6 tag, with extra cleanups (sent to mail list) applied. >> >> This patchset provides the needed user space infrastructure for SKINNY_BG_TREE >> feature. >> >> Since it's an new incompatible feature, unlike SKINNY_METADATA, btrfs-progs >> is needed to convert existing fs (unmounted) to new format, and >> vice-verse. >> >> Now btrfstune can convert regular extent tree fs to bg tree fs to >> improve mount time. >> >> For the performance improvement, please check the kernel patchset cover >> letter or the last patch. >> (SPOILER ALERT: It's super fast) >> >> Changelog: >> v2: >> - Rebase to v5.2.2 tag >> - Add btrfstune ability to convert existing fs to BG_TREE feature >> >> v3: >> - Fix a bug that temp chunks are not cleaned up properly >> This is caused by wrong timing btrfs_convert_to_bg_tree() is called. >> It should be called after temp chunks cleaned up. >> >> - Fix a bug that an extent buffer get leaked >> This is caused by newly created bg tree not added to dirty list. >> >> v4: >> - Go with skinny bg tree other than regular block group item >> We're introducing a new incompatible feature anyway, why not go >> extreme? >> >> - Use the same refactor as kernel. >> To make code much cleaner and easier to read. >> >> - Add the ability to rollback to regular extent tree. >> So confident tester can try SKINNY_BG_TREE using their real world >> data, and rollback if they still want to mount it using older kernels. >> >> Qu Wenruo (11): >> btrfs-progs: check/lowmem: Lookup block group item in a seperate >> function >> btrfs-progs: block-group: Refactor how we read one block group item >> btrfs-progs: Rename btrfs_remove_block_group() and >> free_block_group_item() >> btrfs-progs: block-group: Refactor how we insert a block group item >> btrfs-progs: block-group: Rename write_one_cahce_group() > > I'll add the above patches independently, for the rest I don't know. I > still think the separate tree is somehow wrong so have to convince > myself that it's not. No problem. Since the refactor would be the basis for whatever the final method we choose, it should be pretty OK. Even if we go something like (0, NEW_BLOCK_GROUP_ITEM, bytenr) in extent tree, the refactor still makes a lot of sense. BTW, if we merge the skip_bg mount option before this patchset, we could even make the skinny_bg_tree feature RO compactable. So it would be a good time reviewing that feature. Thanks, Qu
On 2020/5/12 上午2:58, David Sterba wrote: > On Tue, May 05, 2020 at 08:02:19AM +0800, Qu Wenruo wrote: >> This patchset can be fetched from github: >> https://github.com/adam900710/btrfs-progs/tree/skinny_bg_tree >> Which is based on v5.6 tag, with extra cleanups (sent to mail list) applied. >> >> This patchset provides the needed user space infrastructure for SKINNY_BG_TREE >> feature. >> >> Since it's an new incompatible feature, unlike SKINNY_METADATA, btrfs-progs >> is needed to convert existing fs (unmounted) to new format, and >> vice-verse. >> >> Now btrfstune can convert regular extent tree fs to bg tree fs to >> improve mount time. >> >> For the performance improvement, please check the kernel patchset cover >> letter or the last patch. >> (SPOILER ALERT: It's super fast) >> >> Changelog: >> v2: >> - Rebase to v5.2.2 tag >> - Add btrfstune ability to convert existing fs to BG_TREE feature >> >> v3: >> - Fix a bug that temp chunks are not cleaned up properly >> This is caused by wrong timing btrfs_convert_to_bg_tree() is called. >> It should be called after temp chunks cleaned up. >> >> - Fix a bug that an extent buffer get leaked >> This is caused by newly created bg tree not added to dirty list. >> >> v4: >> - Go with skinny bg tree other than regular block group item >> We're introducing a new incompatible feature anyway, why not go >> extreme? >> >> - Use the same refactor as kernel. >> To make code much cleaner and easier to read. >> >> - Add the ability to rollback to regular extent tree. >> So confident tester can try SKINNY_BG_TREE using their real world >> data, and rollback if they still want to mount it using older kernels. >> >> Qu Wenruo (11): >> btrfs-progs: check/lowmem: Lookup block group item in a seperate >> function >> btrfs-progs: block-group: Refactor how we read one block group item >> btrfs-progs: Rename btrfs_remove_block_group() and >> free_block_group_item() >> btrfs-progs: block-group: Refactor how we insert a block group item >> btrfs-progs: block-group: Rename write_one_cahce_group() > > I'll add the above patches independently, for the rest I don't know. I > still think the separate tree is somehow wrong so have to convince > myself that it's not. > One interesting advantage here is, separate block group tree would hugely reduce the possibility to fail to mount due to corrupted extent tree. There are two reports of different corruption on extent tree already in the mail list in the last 24 hours. While the skinny bg tree could hugely reduce the amount of block group items, which means less possibility to corrupt. And since we have less tree blocks for block group tree, the cow cost would also be reduced obviously. As one BGI (just a key) get modified, all modification to other keys in that leaf won't lead to new COW until next transaction. So personally I believe it's much better than regular extent tree. Thanks, Qu
On 12.05.20 г. 5:30 ч., Qu Wenruo wrote: > > > On 2020/5/12 上午2:58, David Sterba wrote: >> On Tue, May 05, 2020 at 08:02:19AM +0800, Qu Wenruo wrote: >>> This patchset can be fetched from github: >>> https://github.com/adam900710/btrfs-progs/tree/skinny_bg_tree >>> Which is based on v5.6 tag, with extra cleanups (sent to mail list) applied. >>> >>> This patchset provides the needed user space infrastructure for SKINNY_BG_TREE >>> feature. >>> >>> Since it's an new incompatible feature, unlike SKINNY_METADATA, btrfs-progs >>> is needed to convert existing fs (unmounted) to new format, and >>> vice-verse. >>> >>> Now btrfstune can convert regular extent tree fs to bg tree fs to >>> improve mount time. >>> >>> For the performance improvement, please check the kernel patchset cover >>> letter or the last patch. >>> (SPOILER ALERT: It's super fast) >>> >>> Changelog: >>> v2: >>> - Rebase to v5.2.2 tag >>> - Add btrfstune ability to convert existing fs to BG_TREE feature >>> >>> v3: >>> - Fix a bug that temp chunks are not cleaned up properly >>> This is caused by wrong timing btrfs_convert_to_bg_tree() is called. >>> It should be called after temp chunks cleaned up. >>> >>> - Fix a bug that an extent buffer get leaked >>> This is caused by newly created bg tree not added to dirty list. >>> >>> v4: >>> - Go with skinny bg tree other than regular block group item >>> We're introducing a new incompatible feature anyway, why not go >>> extreme? >>> >>> - Use the same refactor as kernel. >>> To make code much cleaner and easier to read. >>> >>> - Add the ability to rollback to regular extent tree. >>> So confident tester can try SKINNY_BG_TREE using their real world >>> data, and rollback if they still want to mount it using older kernels. >>> >>> Qu Wenruo (11): >>> btrfs-progs: check/lowmem: Lookup block group item in a seperate >>> function >>> btrfs-progs: block-group: Refactor how we read one block group item >>> btrfs-progs: Rename btrfs_remove_block_group() and >>> free_block_group_item() >>> btrfs-progs: block-group: Refactor how we insert a block group item >>> btrfs-progs: block-group: Rename write_one_cahce_group() >> >> I'll add the above patches independently, for the rest I don't know. I >> still think the separate tree is somehow wrong so have to convince >> myself that it's not. >> > One interesting advantage here is, separate block group tree would > hugely reduce the possibility to fail to mount due to corrupted extent tree. > There are two reports of different corruption on extent tree already in > the mail list in the last 24 hours. > > While the skinny bg tree could hugely reduce the amount of block group > items, which means less possibility to corrupt. > > And since we have less tree blocks for block group tree, the cow cost > would also be reduced obviously. > As one BGI (just a key) get modified, all modification to other keys in > that leaf won't lead to new COW until next transaction. > > So personally I believe it's much better than regular extent tree. Perhaps it will be more convincing if you could substantiate those claims with numbers. I.e run some benchmarks and show numbers under what cases the added complexity brings positives to the table. > > Thanks, > Qu >
On 2020/5/12 下午4:21, Nikolay Borisov wrote: > > > On 12.05.20 г. 5:30 ч., Qu Wenruo wrote: >> >> >> On 2020/5/12 上午2:58, David Sterba wrote: >>> On Tue, May 05, 2020 at 08:02:19AM +0800, Qu Wenruo wrote: >>>> This patchset can be fetched from github: >>>> https://github.com/adam900710/btrfs-progs/tree/skinny_bg_tree >>>> Which is based on v5.6 tag, with extra cleanups (sent to mail list) applied. >>>> >>>> This patchset provides the needed user space infrastructure for SKINNY_BG_TREE >>>> feature. >>>> >>>> Since it's an new incompatible feature, unlike SKINNY_METADATA, btrfs-progs >>>> is needed to convert existing fs (unmounted) to new format, and >>>> vice-verse. >>>> >>>> Now btrfstune can convert regular extent tree fs to bg tree fs to >>>> improve mount time. >>>> >>>> For the performance improvement, please check the kernel patchset cover >>>> letter or the last patch. >>>> (SPOILER ALERT: It's super fast) >>>> >>>> Changelog: >>>> v2: >>>> - Rebase to v5.2.2 tag >>>> - Add btrfstune ability to convert existing fs to BG_TREE feature >>>> >>>> v3: >>>> - Fix a bug that temp chunks are not cleaned up properly >>>> This is caused by wrong timing btrfs_convert_to_bg_tree() is called. >>>> It should be called after temp chunks cleaned up. >>>> >>>> - Fix a bug that an extent buffer get leaked >>>> This is caused by newly created bg tree not added to dirty list. >>>> >>>> v4: >>>> - Go with skinny bg tree other than regular block group item >>>> We're introducing a new incompatible feature anyway, why not go >>>> extreme? >>>> >>>> - Use the same refactor as kernel. >>>> To make code much cleaner and easier to read. >>>> >>>> - Add the ability to rollback to regular extent tree. >>>> So confident tester can try SKINNY_BG_TREE using their real world >>>> data, and rollback if they still want to mount it using older kernels. >>>> >>>> Qu Wenruo (11): >>>> btrfs-progs: check/lowmem: Lookup block group item in a seperate >>>> function >>>> btrfs-progs: block-group: Refactor how we read one block group item >>>> btrfs-progs: Rename btrfs_remove_block_group() and >>>> free_block_group_item() >>>> btrfs-progs: block-group: Refactor how we insert a block group item >>>> btrfs-progs: block-group: Rename write_one_cahce_group() >>> >>> I'll add the above patches independently, for the rest I don't know. I >>> still think the separate tree is somehow wrong so have to convince >>> myself that it's not. >>> >> One interesting advantage here is, separate block group tree would >> hugely reduce the possibility to fail to mount due to corrupted extent tree. >> There are two reports of different corruption on extent tree already in >> the mail list in the last 24 hours. >> >> While the skinny bg tree could hugely reduce the amount of block group >> items, which means less possibility to corrupt. >> >> And since we have less tree blocks for block group tree, the cow cost >> would also be reduced obviously. >> As one BGI (just a key) get modified, all modification to other keys in >> that leaf won't lead to new COW until next transaction. >> >> So personally I believe it's much better than regular extent tree. > > Perhaps it will be more convincing if you could substantiate those > claims with numbers. I.e run some benchmarks and show numbers under what > cases the added complexity brings positives to the table. Exactly in the patch implementing the feature: | Extent tree | Skinny bg tree | ---------------------------------------------- nodes | 55 | 1 | leaves | 1025 | 7 | total | 1080 | 8 | That's 1T used fs with 4K node size, which has at least 1024 data block groups. The above number is the needed tree blocks to be iterated to read all block group items. Thanks, Qu > >> >> Thanks, >> Qu >>