diff mbox series

[v2,2/2] btrfs-progs: Avoid unnecessary block group item COW if the content hasn't changed

Message ID 20190708073352.6095-3-wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs-progs: Fix delayed ref leakage | expand

Commit Message

Qu Wenruo July 8, 2019, 7:33 a.m. UTC
In write_one_cache_group() we always do COW to update BLOCK_GROUP_ITEM.
However under a lot of cases, the cache->item is not changed at all.

E.g:
Transaction 123, block group [1M, 1M + 16M)

tree block 1M + 0 get freed
tree block 1M + 16K get allocated.

Transaction 123 get committed.

In this case, used space of block group [1M, 1M + 16M) doesn't changed
at all, thus we don't need to do COW to update block group item.

This patch will make write_one_cache_group() to do a read-only search
first, then do COW if we really need to update block group item.

This should reduce the btrfs_write_dirty_block_groups() and
btrfs_run_delayed_refs() loop introduced in previous commit.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 extent-tree.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

Comments

Nikolay Borisov July 8, 2019, 10:43 a.m. UTC | #1
On 8.07.19 г. 10:33 ч., Qu Wenruo wrote:
> In write_one_cache_group() we always do COW to update BLOCK_GROUP_ITEM.
> However under a lot of cases, the cache->item is not changed at all.
> 
> E.g:
> Transaction 123, block group [1M, 1M + 16M)
> 
> tree block 1M + 0 get freed
> tree block 1M + 16K get allocated.
> 
> Transaction 123 get committed.
> 
> In this case, used space of block group [1M, 1M + 16M) doesn't changed
> at all, thus we don't need to do COW to update block group item.
> 
> This patch will make write_one_cache_group() to do a read-only search
> first, then do COW if we really need to update block group item.
> 
> This should reduce the btrfs_write_dirty_block_groups() and
> btrfs_run_delayed_refs() loop introduced in previous commit.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

I'm not sure how effective this is going to be and isn't this premature
optimization, have you done any measurements?


> ---
>  extent-tree.c | 26 +++++++++++++++++++++++++-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/extent-tree.c b/extent-tree.c
> index 932af2c644bd..24d3a1ab3f25 100644
> --- a/extent-tree.c
> +++ b/extent-tree.c
> @@ -1533,10 +1533,34 @@ static int write_one_cache_group(struct btrfs_trans_handle *trans,
>  	unsigned long bi;
>  	struct extent_buffer *leaf;
>  
> +	/* Do a read only check to see if we need to update BLOCK_GROUP_ITEM */
> +	ret = btrfs_search_slot(NULL, extent_root, &cache->key, path, 0, 0);
> +	if (ret < 0)
> +		goto fail;
> +	if (ret > 0) {
> +		ret = -ENOENT;
> +		error("failed to find block group %llu in extent tree",
> +			cache->key.objectid);
> +		goto fail;
> +	}
> +	leaf = path->nodes[0];
> +	bi = btrfs_item_ptr_offset(leaf, path->slots[0]);
> +	ret = memcmp_extent_buffer(leaf, &cache->item, bi, sizeof(cache->item));
> +	btrfs_release_path(path);
> +	/* No need to update */
> +	if (ret == 0)
> +		return ret;
> +
> +	/* Do the COW to update BLOCK_GROUP_ITEM */
>  	ret = btrfs_search_slot(trans, extent_root, &cache->key, path, 0, 1);
>  	if (ret < 0)
>  		goto fail;
> -	BUG_ON(ret);
> +	if (ret > 0) {
> +		ret = -ENOENT;
> +		error("failed to find block group %llu in extent tree",
> +			cache->key.objectid);
> +		goto fail;
> +	}
>  
>  	leaf = path->nodes[0];
>  	bi = btrfs_item_ptr_offset(leaf, path->slots[0]);
>
Qu Wenruo July 8, 2019, 12:50 p.m. UTC | #2
On 2019/7/8 下午6:43, Nikolay Borisov wrote:
>
>
> On 8.07.19 г. 10:33 ч., Qu Wenruo wrote:
>> In write_one_cache_group() we always do COW to update BLOCK_GROUP_ITEM.
>> However under a lot of cases, the cache->item is not changed at all.
>>
>> E.g:
>> Transaction 123, block group [1M, 1M + 16M)
>>
>> tree block 1M + 0 get freed
>> tree block 1M + 16K get allocated.
>>
>> Transaction 123 get committed.
>>
>> In this case, used space of block group [1M, 1M + 16M) doesn't changed
>> at all, thus we don't need to do COW to update block group item.
>>
>> This patch will make write_one_cache_group() to do a read-only search
>> first, then do COW if we really need to update block group item.
>>
>> This should reduce the btrfs_write_dirty_block_groups() and
>> btrfs_run_delayed_refs() loop introduced in previous commit.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>
> I'm not sure how effective this is going to be

The effectiveness is indeed low.

For btrfs/013 test case, 64K page size, it reduces total number of
delayed refs by less than 2% (5/300+)

And similar result for total number of dirty block groups.

> and isn't this premature
> optimization, have you done any measurements?

For the optimization part, I'd say it should be pretty safe.
It just really skips unnecessary CoW.

The only downside to me is the extra tree search, thus killing the
"optimization" part.

Thanks,
Qu
>
>
>> ---
>>  extent-tree.c | 26 +++++++++++++++++++++++++-
>>  1 file changed, 25 insertions(+), 1 deletion(-)
>>
>> diff --git a/extent-tree.c b/extent-tree.c
>> index 932af2c644bd..24d3a1ab3f25 100644
>> --- a/extent-tree.c
>> +++ b/extent-tree.c
>> @@ -1533,10 +1533,34 @@ static int write_one_cache_group(struct btrfs_trans_handle *trans,
>>  	unsigned long bi;
>>  	struct extent_buffer *leaf;
>>
>> +	/* Do a read only check to see if we need to update BLOCK_GROUP_ITEM */
>> +	ret = btrfs_search_slot(NULL, extent_root, &cache->key, path, 0, 0);
>> +	if (ret < 0)
>> +		goto fail;
>> +	if (ret > 0) {
>> +		ret = -ENOENT;
>> +		error("failed to find block group %llu in extent tree",
>> +			cache->key.objectid);
>> +		goto fail;
>> +	}
>> +	leaf = path->nodes[0];
>> +	bi = btrfs_item_ptr_offset(leaf, path->slots[0]);
>> +	ret = memcmp_extent_buffer(leaf, &cache->item, bi, sizeof(cache->item));
>> +	btrfs_release_path(path);
>> +	/* No need to update */
>> +	if (ret == 0)
>> +		return ret;
>> +
>> +	/* Do the COW to update BLOCK_GROUP_ITEM */
>>  	ret = btrfs_search_slot(trans, extent_root, &cache->key, path, 0, 1);
>>  	if (ret < 0)
>>  		goto fail;
>> -	BUG_ON(ret);
>> +	if (ret > 0) {
>> +		ret = -ENOENT;
>> +		error("failed to find block group %llu in extent tree",
>> +			cache->key.objectid);
>> +		goto fail;
>> +	}
>>
>>  	leaf = path->nodes[0];
>>  	bi = btrfs_item_ptr_offset(leaf, path->slots[0]);
>>
Nikolay Borisov July 8, 2019, 1:07 p.m. UTC | #3
On 8.07.19 г. 15:50 ч., Qu Wenruo wrote:
> 
> 
> On 2019/7/8 下午6:43, Nikolay Borisov wrote:
>>
>>
>> On 8.07.19 г. 10:33 ч., Qu Wenruo wrote:
>>> In write_one_cache_group() we always do COW to update BLOCK_GROUP_ITEM.
>>> However under a lot of cases, the cache->item is not changed at all.
>>>
>>> E.g:
>>> Transaction 123, block group [1M, 1M + 16M)
>>>
>>> tree block 1M + 0 get freed
>>> tree block 1M + 16K get allocated.
>>>
>>> Transaction 123 get committed.
>>>
>>> In this case, used space of block group [1M, 1M + 16M) doesn't changed
>>> at all, thus we don't need to do COW to update block group item.
>>>
>>> This patch will make write_one_cache_group() to do a read-only search
>>> first, then do COW if we really need to update block group item.
>>>
>>> This should reduce the btrfs_write_dirty_block_groups() and
>>> btrfs_run_delayed_refs() loop introduced in previous commit.
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>
>> I'm not sure how effective this is going to be
> 
> The effectiveness is indeed low.
> 
> For btrfs/013 test case, 64K page size, it reduces total number of
> delayed refs by less than 2% (5/300+)
> 
> And similar result for total number of dirty block groups.
> 
>> and isn't this premature
>> optimization, have you done any measurements?
> 
> For the optimization part, I'd say it should be pretty safe.
> It just really skips unnecessary CoW.
> 
> The only downside to me is the extra tree search, thus killing the
> "optimization" part.
> 

If that's the case then I'd rather see the 2nd patch dropped. It adds
more code for no gain.

<snip>
Qu Wenruo July 8, 2019, 1:30 p.m. UTC | #4
On 2019/7/8 下午9:07, Nikolay Borisov wrote:
>
>
> On 8.07.19 г. 15:50 ч., Qu Wenruo wrote:
>>
>>
>> On 2019/7/8 下午6:43, Nikolay Borisov wrote:
>>>
>>>
>>> On 8.07.19 г. 10:33 ч., Qu Wenruo wrote:
>>>> In write_one_cache_group() we always do COW to update BLOCK_GROUP_ITEM.
>>>> However under a lot of cases, the cache->item is not changed at all.
>>>>
>>>> E.g:
>>>> Transaction 123, block group [1M, 1M + 16M)
>>>>
>>>> tree block 1M + 0 get freed
>>>> tree block 1M + 16K get allocated.
>>>>
>>>> Transaction 123 get committed.
>>>>
>>>> In this case, used space of block group [1M, 1M + 16M) doesn't changed
>>>> at all, thus we don't need to do COW to update block group item.
>>>>
>>>> This patch will make write_one_cache_group() to do a read-only search
>>>> first, then do COW if we really need to update block group item.
>>>>
>>>> This should reduce the btrfs_write_dirty_block_groups() and
>>>> btrfs_run_delayed_refs() loop introduced in previous commit.
>>>>
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>
>>> I'm not sure how effective this is going to be
>>
>> The effectiveness is indeed low.
>>
>> For btrfs/013 test case, 64K page size, it reduces total number of
>> delayed refs by less than 2% (5/300+)
>>
>> And similar result for total number of dirty block groups.
>>
>>> and isn't this premature
>>> optimization, have you done any measurements?
>>
>> For the optimization part, I'd say it should be pretty safe.
>> It just really skips unnecessary CoW.
>>
>> The only downside to me is the extra tree search, thus killing the
>> "optimization" part.
>>
>
> If that's the case then I'd rather see the 2nd patch dropped. It adds
> more code for no gain.

Makes sense. I'm OK to drop it.

Thanks,
Qu
>
> <snip>
>
David Sterba July 22, 2019, 12:59 p.m. UTC | #5
On Mon, Jul 08, 2019 at 09:30:04PM +0800, Qu Wenruo wrote:
> On 2019/7/8 下午9:07, Nikolay Borisov wrote:
> > On 8.07.19 г. 15:50 ч., Qu Wenruo wrote:
> >> On 2019/7/8 下午6:43, Nikolay Borisov wrote:
> >>> On 8.07.19 г. 10:33 ч., Qu Wenruo wrote:
> > If that's the case then I'd rather see the 2nd patch dropped. It adds
> > more code for no gain.
> 
> Makes sense. I'm OK to drop it.

Ok, dropped.
diff mbox series

Patch

diff --git a/extent-tree.c b/extent-tree.c
index 932af2c644bd..24d3a1ab3f25 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -1533,10 +1533,34 @@  static int write_one_cache_group(struct btrfs_trans_handle *trans,
 	unsigned long bi;
 	struct extent_buffer *leaf;
 
+	/* Do a read only check to see if we need to update BLOCK_GROUP_ITEM */
+	ret = btrfs_search_slot(NULL, extent_root, &cache->key, path, 0, 0);
+	if (ret < 0)
+		goto fail;
+	if (ret > 0) {
+		ret = -ENOENT;
+		error("failed to find block group %llu in extent tree",
+			cache->key.objectid);
+		goto fail;
+	}
+	leaf = path->nodes[0];
+	bi = btrfs_item_ptr_offset(leaf, path->slots[0]);
+	ret = memcmp_extent_buffer(leaf, &cache->item, bi, sizeof(cache->item));
+	btrfs_release_path(path);
+	/* No need to update */
+	if (ret == 0)
+		return ret;
+
+	/* Do the COW to update BLOCK_GROUP_ITEM */
 	ret = btrfs_search_slot(trans, extent_root, &cache->key, path, 0, 1);
 	if (ret < 0)
 		goto fail;
-	BUG_ON(ret);
+	if (ret > 0) {
+		ret = -ENOENT;
+		error("failed to find block group %llu in extent tree",
+			cache->key.objectid);
+		goto fail;
+	}
 
 	leaf = path->nodes[0];
 	bi = btrfs_item_ptr_offset(leaf, path->slots[0]);