diff mbox

btrfs: use correct compare function of dirty_metadata_bytes

Message ID 20180702074458.16270-1-ethanlien@synology.com (mailing list archive)
State New, archived
Headers show

Commit Message

ethanlien July 2, 2018, 7:44 a.m. UTC
We use customized, nodesize batch value to update dirty_metadata_bytes.
We should also use batch version of compare function or we will easily
goto fast path and get false result from percpu_counter_compare().

Signed-off-by: Ethan Lien <ethanlien@synology.com>
---
 fs/btrfs/disk-io.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Comments

Nikolay Borisov July 2, 2018, 7:49 a.m. UTC | #1
On  2.07.2018 10:44, Ethan Lien wrote:
> We use customized, nodesize batch value to update dirty_metadata_bytes.
> We should also use batch version of compare function or we will easily
> goto fast path and get false result from percpu_counter_compare().
> 
> Signed-off-by: Ethan Lien <ethanlien@synology.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

Also this needs the following tag: 

Fixes: e2d845211eda ("Btrfs: use percpu counter for dirty metadata count")

And perhaps stable from 4.4 onwards. But David will take care
of that. So the patch is good. 

> ---
>  fs/btrfs/disk-io.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 205092dc9390..dfed08e70ec1 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -961,8 +961,9 @@ static int btree_writepages(struct address_space *mapping,
>  
>  		fs_info = BTRFS_I(mapping->host)->root->fs_info;
>  		/* this is a bit racy, but that's ok */
> -		ret = percpu_counter_compare(&fs_info->dirty_metadata_bytes,
> -					     BTRFS_DIRTY_METADATA_THRESH);
> +		ret = __percpu_counter_compare(&fs_info->dirty_metadata_bytes,
> +					     BTRFS_DIRTY_METADATA_THRESH,
> +					     fs_info->dirty_metadata_batch);
>  		if (ret < 0)
>  			return 0;
>  	}
> @@ -4150,8 +4151,9 @@ static void __btrfs_btree_balance_dirty(struct btrfs_fs_info *fs_info,
>  	if (flush_delayed)
>  		btrfs_balance_delayed_items(fs_info);
>  
> -	ret = percpu_counter_compare(&fs_info->dirty_metadata_bytes,
> -				     BTRFS_DIRTY_METADATA_THRESH);
> +	ret = __percpu_counter_compare(&fs_info->dirty_metadata_bytes,
> +				     BTRFS_DIRTY_METADATA_THRESH,
> +				     fs_info->dirty_metadata_batch);
>  	if (ret > 0) {
>  		balance_dirty_pages_ratelimited(fs_info->btree_inode->i_mapping);
>  	}
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba July 4, 2018, 4:26 p.m. UTC | #2
On Mon, Jul 02, 2018 at 03:44:58PM +0800, Ethan Lien wrote:
> We use customized, nodesize batch value to update dirty_metadata_bytes.
> We should also use batch version of compare function or we will easily
> goto fast path and get false result from percpu_counter_compare().
> 
> Signed-off-by: Ethan Lien <ethanlien@synology.com>

This looks like it could have some observable effects. The default batch
is 32 * cpus online, while the one supplied by btrfs is much higher as
it's a multiple of nodesize.

The BTRFS_DIRTY_METADATA_THRESH is 32MiB so the comparision is likely to
be off scale in some if not most cases. I got lost in the callchains and
how the return values of writepage are interpreted.

The effect I see is: there are many pages batched (by the _add_batch
calls) but none of the compare calls dects that and exits early. So
there are more metadata being built up in memory and will have to be
synced later.

The change looks correct to me, we have to compare the numbers of the
same units. I'll add it to misc-next for testing, maybe the 0day bot
would detect some changes in performance.

Please let me know if you have further insights or clarifications to
what's written above, the changelog could use some update as the change
has some unobvious consequences.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba July 4, 2018, 4:55 p.m. UTC | #3
On Mon, Jul 02, 2018 at 10:49:46AM +0300, Nikolay Borisov wrote:
> On  2.07.2018 10:44, Ethan Lien wrote:
> > We use customized, nodesize batch value to update dirty_metadata_bytes.
> > We should also use batch version of compare function or we will easily
> > goto fast path and get false result from percpu_counter_compare().
> > 
> > Signed-off-by: Ethan Lien <ethanlien@synology.com>
> 
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> 
> Also this needs the following tag: 
> 
> Fixes: e2d845211eda ("Btrfs: use percpu counter for dirty metadata count")

I'll add the tags, thanks.

2939         fs_info->dirty_metadata_batch = nodesize * (1 + ilog2(nr_cpu_ids));

What a commit, sigh. Arbitrary constants without any explanation. I
don't quite understand why the ilog2(nr_cpui_ids) was chosen. If cpus
are onlined/offlined there's no adjustment. I'd think that there should
be some amount of data that are supposed to be dirtied per-cpu and this
value is independent of the number of cpus. That the online/offline
status is abstracted by the percpu API. I don't have time to study that
too deeply, but this is something that needs to be revisited.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolay Borisov July 4, 2018, 4:58 p.m. UTC | #4
On  4.07.2018 19:55, David Sterba wrote:
> On Mon, Jul 02, 2018 at 10:49:46AM +0300, Nikolay Borisov wrote:
>> On  2.07.2018 10:44, Ethan Lien wrote:
>>> We use customized, nodesize batch value to update dirty_metadata_bytes.
>>> We should also use batch version of compare function or we will easily
>>> goto fast path and get false result from percpu_counter_compare().
>>>
>>> Signed-off-by: Ethan Lien <ethanlien@synology.com>
>>
>> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
>>
>> Also this needs the following tag: 
>>
>> Fixes: e2d845211eda ("Btrfs: use percpu counter for dirty metadata count")
> 
> I'll add the tags, thanks.
> 
> 2939         fs_info->dirty_metadata_batch = nodesize * (1 + ilog2(nr_cpu_ids));
> 
> What a commit, sigh. Arbitrary constants without any explanation. I
> don't quite understand why the ilog2(nr_cpui_ids) was chosen. If cpus
> are onlined/offlined there's no adjustment. I'd think that there should
> be some amount of data that are supposed to be dirtied per-cpu and this
> value is independent of the number of cpus. That the online/offline
> status is abstracted by the percpu API. I don't have time to study that
> too deeply, but this is something that needs to be revisited.

TBH, what is perhaps the safest approach would be to modify the code to
eliminate the custom batch related variables/code. The code has been
working like that for years.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 205092dc9390..dfed08e70ec1 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -961,8 +961,9 @@  static int btree_writepages(struct address_space *mapping,
 
 		fs_info = BTRFS_I(mapping->host)->root->fs_info;
 		/* this is a bit racy, but that's ok */
-		ret = percpu_counter_compare(&fs_info->dirty_metadata_bytes,
-					     BTRFS_DIRTY_METADATA_THRESH);
+		ret = __percpu_counter_compare(&fs_info->dirty_metadata_bytes,
+					     BTRFS_DIRTY_METADATA_THRESH,
+					     fs_info->dirty_metadata_batch);
 		if (ret < 0)
 			return 0;
 	}
@@ -4150,8 +4151,9 @@  static void __btrfs_btree_balance_dirty(struct btrfs_fs_info *fs_info,
 	if (flush_delayed)
 		btrfs_balance_delayed_items(fs_info);
 
-	ret = percpu_counter_compare(&fs_info->dirty_metadata_bytes,
-				     BTRFS_DIRTY_METADATA_THRESH);
+	ret = __percpu_counter_compare(&fs_info->dirty_metadata_bytes,
+				     BTRFS_DIRTY_METADATA_THRESH,
+				     fs_info->dirty_metadata_batch);
 	if (ret > 0) {
 		balance_dirty_pages_ratelimited(fs_info->btree_inode->i_mapping);
 	}