Message ID | 20190708073352.6095-3-wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs-progs: Fix delayed ref leakage | expand |
On 8.07.19 г. 10:33 ч., Qu Wenruo wrote: > In write_one_cache_group() we always do COW to update BLOCK_GROUP_ITEM. > However under a lot of cases, the cache->item is not changed at all. > > E.g: > Transaction 123, block group [1M, 1M + 16M) > > tree block 1M + 0 get freed > tree block 1M + 16K get allocated. > > Transaction 123 get committed. > > In this case, used space of block group [1M, 1M + 16M) doesn't changed > at all, thus we don't need to do COW to update block group item. > > This patch will make write_one_cache_group() to do a read-only search > first, then do COW if we really need to update block group item. > > This should reduce the btrfs_write_dirty_block_groups() and > btrfs_run_delayed_refs() loop introduced in previous commit. > > Signed-off-by: Qu Wenruo <wqu@suse.com> I'm not sure how effective this is going to be and isn't this premature optimization, have you done any measurements? > --- > extent-tree.c | 26 +++++++++++++++++++++++++- > 1 file changed, 25 insertions(+), 1 deletion(-) > > diff --git a/extent-tree.c b/extent-tree.c > index 932af2c644bd..24d3a1ab3f25 100644 > --- a/extent-tree.c > +++ b/extent-tree.c > @@ -1533,10 +1533,34 @@ static int write_one_cache_group(struct btrfs_trans_handle *trans, > unsigned long bi; > struct extent_buffer *leaf; > > + /* Do a read only check to see if we need to update BLOCK_GROUP_ITEM */ > + ret = btrfs_search_slot(NULL, extent_root, &cache->key, path, 0, 0); > + if (ret < 0) > + goto fail; > + if (ret > 0) { > + ret = -ENOENT; > + error("failed to find block group %llu in extent tree", > + cache->key.objectid); > + goto fail; > + } > + leaf = path->nodes[0]; > + bi = btrfs_item_ptr_offset(leaf, path->slots[0]); > + ret = memcmp_extent_buffer(leaf, &cache->item, bi, sizeof(cache->item)); > + btrfs_release_path(path); > + /* No need to update */ > + if (ret == 0) > + return ret; > + > + /* Do the COW to update BLOCK_GROUP_ITEM */ > ret = btrfs_search_slot(trans, extent_root, &cache->key, path, 0, 1); > if (ret < 0) > goto fail; > - BUG_ON(ret); > + if (ret > 0) { > + ret = -ENOENT; > + error("failed to find block group %llu in extent tree", > + cache->key.objectid); > + goto fail; > + } > > leaf = path->nodes[0]; > bi = btrfs_item_ptr_offset(leaf, path->slots[0]); >
On 2019/7/8 下午6:43, Nikolay Borisov wrote: > > > On 8.07.19 г. 10:33 ч., Qu Wenruo wrote: >> In write_one_cache_group() we always do COW to update BLOCK_GROUP_ITEM. >> However under a lot of cases, the cache->item is not changed at all. >> >> E.g: >> Transaction 123, block group [1M, 1M + 16M) >> >> tree block 1M + 0 get freed >> tree block 1M + 16K get allocated. >> >> Transaction 123 get committed. >> >> In this case, used space of block group [1M, 1M + 16M) doesn't changed >> at all, thus we don't need to do COW to update block group item. >> >> This patch will make write_one_cache_group() to do a read-only search >> first, then do COW if we really need to update block group item. >> >> This should reduce the btrfs_write_dirty_block_groups() and >> btrfs_run_delayed_refs() loop introduced in previous commit. >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> > > I'm not sure how effective this is going to be The effectiveness is indeed low. For btrfs/013 test case, 64K page size, it reduces total number of delayed refs by less than 2% (5/300+) And similar result for total number of dirty block groups. > and isn't this premature > optimization, have you done any measurements? For the optimization part, I'd say it should be pretty safe. It just really skips unnecessary CoW. The only downside to me is the extra tree search, thus killing the "optimization" part. Thanks, Qu > > >> --- >> extent-tree.c | 26 +++++++++++++++++++++++++- >> 1 file changed, 25 insertions(+), 1 deletion(-) >> >> diff --git a/extent-tree.c b/extent-tree.c >> index 932af2c644bd..24d3a1ab3f25 100644 >> --- a/extent-tree.c >> +++ b/extent-tree.c >> @@ -1533,10 +1533,34 @@ static int write_one_cache_group(struct btrfs_trans_handle *trans, >> unsigned long bi; >> struct extent_buffer *leaf; >> >> + /* Do a read only check to see if we need to update BLOCK_GROUP_ITEM */ >> + ret = btrfs_search_slot(NULL, extent_root, &cache->key, path, 0, 0); >> + if (ret < 0) >> + goto fail; >> + if (ret > 0) { >> + ret = -ENOENT; >> + error("failed to find block group %llu in extent tree", >> + cache->key.objectid); >> + goto fail; >> + } >> + leaf = path->nodes[0]; >> + bi = btrfs_item_ptr_offset(leaf, path->slots[0]); >> + ret = memcmp_extent_buffer(leaf, &cache->item, bi, sizeof(cache->item)); >> + btrfs_release_path(path); >> + /* No need to update */ >> + if (ret == 0) >> + return ret; >> + >> + /* Do the COW to update BLOCK_GROUP_ITEM */ >> ret = btrfs_search_slot(trans, extent_root, &cache->key, path, 0, 1); >> if (ret < 0) >> goto fail; >> - BUG_ON(ret); >> + if (ret > 0) { >> + ret = -ENOENT; >> + error("failed to find block group %llu in extent tree", >> + cache->key.objectid); >> + goto fail; >> + } >> >> leaf = path->nodes[0]; >> bi = btrfs_item_ptr_offset(leaf, path->slots[0]); >>
On 8.07.19 г. 15:50 ч., Qu Wenruo wrote: > > > On 2019/7/8 下午6:43, Nikolay Borisov wrote: >> >> >> On 8.07.19 г. 10:33 ч., Qu Wenruo wrote: >>> In write_one_cache_group() we always do COW to update BLOCK_GROUP_ITEM. >>> However under a lot of cases, the cache->item is not changed at all. >>> >>> E.g: >>> Transaction 123, block group [1M, 1M + 16M) >>> >>> tree block 1M + 0 get freed >>> tree block 1M + 16K get allocated. >>> >>> Transaction 123 get committed. >>> >>> In this case, used space of block group [1M, 1M + 16M) doesn't changed >>> at all, thus we don't need to do COW to update block group item. >>> >>> This patch will make write_one_cache_group() to do a read-only search >>> first, then do COW if we really need to update block group item. >>> >>> This should reduce the btrfs_write_dirty_block_groups() and >>> btrfs_run_delayed_refs() loop introduced in previous commit. >>> >>> Signed-off-by: Qu Wenruo <wqu@suse.com> >> >> I'm not sure how effective this is going to be > > The effectiveness is indeed low. > > For btrfs/013 test case, 64K page size, it reduces total number of > delayed refs by less than 2% (5/300+) > > And similar result for total number of dirty block groups. > >> and isn't this premature >> optimization, have you done any measurements? > > For the optimization part, I'd say it should be pretty safe. > It just really skips unnecessary CoW. > > The only downside to me is the extra tree search, thus killing the > "optimization" part. > If that's the case then I'd rather see the 2nd patch dropped. It adds more code for no gain. <snip>
On 2019/7/8 下午9:07, Nikolay Borisov wrote: > > > On 8.07.19 г. 15:50 ч., Qu Wenruo wrote: >> >> >> On 2019/7/8 下午6:43, Nikolay Borisov wrote: >>> >>> >>> On 8.07.19 г. 10:33 ч., Qu Wenruo wrote: >>>> In write_one_cache_group() we always do COW to update BLOCK_GROUP_ITEM. >>>> However under a lot of cases, the cache->item is not changed at all. >>>> >>>> E.g: >>>> Transaction 123, block group [1M, 1M + 16M) >>>> >>>> tree block 1M + 0 get freed >>>> tree block 1M + 16K get allocated. >>>> >>>> Transaction 123 get committed. >>>> >>>> In this case, used space of block group [1M, 1M + 16M) doesn't changed >>>> at all, thus we don't need to do COW to update block group item. >>>> >>>> This patch will make write_one_cache_group() to do a read-only search >>>> first, then do COW if we really need to update block group item. >>>> >>>> This should reduce the btrfs_write_dirty_block_groups() and >>>> btrfs_run_delayed_refs() loop introduced in previous commit. >>>> >>>> Signed-off-by: Qu Wenruo <wqu@suse.com> >>> >>> I'm not sure how effective this is going to be >> >> The effectiveness is indeed low. >> >> For btrfs/013 test case, 64K page size, it reduces total number of >> delayed refs by less than 2% (5/300+) >> >> And similar result for total number of dirty block groups. >> >>> and isn't this premature >>> optimization, have you done any measurements? >> >> For the optimization part, I'd say it should be pretty safe. >> It just really skips unnecessary CoW. >> >> The only downside to me is the extra tree search, thus killing the >> "optimization" part. >> > > If that's the case then I'd rather see the 2nd patch dropped. It adds > more code for no gain. Makes sense. I'm OK to drop it. Thanks, Qu > > <snip> >
On Mon, Jul 08, 2019 at 09:30:04PM +0800, Qu Wenruo wrote: > On 2019/7/8 下午9:07, Nikolay Borisov wrote: > > On 8.07.19 г. 15:50 ч., Qu Wenruo wrote: > >> On 2019/7/8 下午6:43, Nikolay Borisov wrote: > >>> On 8.07.19 г. 10:33 ч., Qu Wenruo wrote: > > If that's the case then I'd rather see the 2nd patch dropped. It adds > > more code for no gain. > > Makes sense. I'm OK to drop it. Ok, dropped.
diff --git a/extent-tree.c b/extent-tree.c index 932af2c644bd..24d3a1ab3f25 100644 --- a/extent-tree.c +++ b/extent-tree.c @@ -1533,10 +1533,34 @@ static int write_one_cache_group(struct btrfs_trans_handle *trans, unsigned long bi; struct extent_buffer *leaf; + /* Do a read only check to see if we need to update BLOCK_GROUP_ITEM */ + ret = btrfs_search_slot(NULL, extent_root, &cache->key, path, 0, 0); + if (ret < 0) + goto fail; + if (ret > 0) { + ret = -ENOENT; + error("failed to find block group %llu in extent tree", + cache->key.objectid); + goto fail; + } + leaf = path->nodes[0]; + bi = btrfs_item_ptr_offset(leaf, path->slots[0]); + ret = memcmp_extent_buffer(leaf, &cache->item, bi, sizeof(cache->item)); + btrfs_release_path(path); + /* No need to update */ + if (ret == 0) + return ret; + + /* Do the COW to update BLOCK_GROUP_ITEM */ ret = btrfs_search_slot(trans, extent_root, &cache->key, path, 0, 1); if (ret < 0) goto fail; - BUG_ON(ret); + if (ret > 0) { + ret = -ENOENT; + error("failed to find block group %llu in extent tree", + cache->key.objectid); + goto fail; + } leaf = path->nodes[0]; bi = btrfs_item_ptr_offset(leaf, path->slots[0]);
In write_one_cache_group() we always do COW to update BLOCK_GROUP_ITEM. However under a lot of cases, the cache->item is not changed at all. E.g: Transaction 123, block group [1M, 1M + 16M) tree block 1M + 0 get freed tree block 1M + 16K get allocated. Transaction 123 get committed. In this case, used space of block group [1M, 1M + 16M) doesn't changed at all, thus we don't need to do COW to update block group item. This patch will make write_one_cache_group() to do a read-only search first, then do COW if we really need to update block group item. This should reduce the btrfs_write_dirty_block_groups() and btrfs_run_delayed_refs() loop introduced in previous commit. Signed-off-by: Qu Wenruo <wqu@suse.com> --- extent-tree.c | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-)