Message ID | 20180702074458.16270-1-ethanlien@synology.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 2.07.2018 10:44, Ethan Lien wrote: > We use customized, nodesize batch value to update dirty_metadata_bytes. > We should also use batch version of compare function or we will easily > goto fast path and get false result from percpu_counter_compare(). > > Signed-off-by: Ethan Lien <ethanlien@synology.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Also this needs the following tag: Fixes: e2d845211eda ("Btrfs: use percpu counter for dirty metadata count") And perhaps stable from 4.4 onwards. But David will take care of that. So the patch is good. > --- > fs/btrfs/disk-io.c | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 205092dc9390..dfed08e70ec1 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -961,8 +961,9 @@ static int btree_writepages(struct address_space *mapping, > > fs_info = BTRFS_I(mapping->host)->root->fs_info; > /* this is a bit racy, but that's ok */ > - ret = percpu_counter_compare(&fs_info->dirty_metadata_bytes, > - BTRFS_DIRTY_METADATA_THRESH); > + ret = __percpu_counter_compare(&fs_info->dirty_metadata_bytes, > + BTRFS_DIRTY_METADATA_THRESH, > + fs_info->dirty_metadata_batch); > if (ret < 0) > return 0; > } > @@ -4150,8 +4151,9 @@ static void __btrfs_btree_balance_dirty(struct btrfs_fs_info *fs_info, > if (flush_delayed) > btrfs_balance_delayed_items(fs_info); > > - ret = percpu_counter_compare(&fs_info->dirty_metadata_bytes, > - BTRFS_DIRTY_METADATA_THRESH); > + ret = __percpu_counter_compare(&fs_info->dirty_metadata_bytes, > + BTRFS_DIRTY_METADATA_THRESH, > + fs_info->dirty_metadata_batch); > if (ret > 0) { > balance_dirty_pages_ratelimited(fs_info->btree_inode->i_mapping); > } > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jul 02, 2018 at 03:44:58PM +0800, Ethan Lien wrote: > We use customized, nodesize batch value to update dirty_metadata_bytes. > We should also use batch version of compare function or we will easily > goto fast path and get false result from percpu_counter_compare(). > > Signed-off-by: Ethan Lien <ethanlien@synology.com> This looks like it could have some observable effects. The default batch is 32 * cpus online, while the one supplied by btrfs is much higher as it's a multiple of nodesize. The BTRFS_DIRTY_METADATA_THRESH is 32MiB so the comparision is likely to be off scale in some if not most cases. I got lost in the callchains and how the return values of writepage are interpreted. The effect I see is: there are many pages batched (by the _add_batch calls) but none of the compare calls dects that and exits early. So there are more metadata being built up in memory and will have to be synced later. The change looks correct to me, we have to compare the numbers of the same units. I'll add it to misc-next for testing, maybe the 0day bot would detect some changes in performance. Please let me know if you have further insights or clarifications to what's written above, the changelog could use some update as the change has some unobvious consequences. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jul 02, 2018 at 10:49:46AM +0300, Nikolay Borisov wrote: > On 2.07.2018 10:44, Ethan Lien wrote: > > We use customized, nodesize batch value to update dirty_metadata_bytes. > > We should also use batch version of compare function or we will easily > > goto fast path and get false result from percpu_counter_compare(). > > > > Signed-off-by: Ethan Lien <ethanlien@synology.com> > > Reviewed-by: Nikolay Borisov <nborisov@suse.com> > > Also this needs the following tag: > > Fixes: e2d845211eda ("Btrfs: use percpu counter for dirty metadata count") I'll add the tags, thanks. 2939 fs_info->dirty_metadata_batch = nodesize * (1 + ilog2(nr_cpu_ids)); What a commit, sigh. Arbitrary constants without any explanation. I don't quite understand why the ilog2(nr_cpui_ids) was chosen. If cpus are onlined/offlined there's no adjustment. I'd think that there should be some amount of data that are supposed to be dirtied per-cpu and this value is independent of the number of cpus. That the online/offline status is abstracted by the percpu API. I don't have time to study that too deeply, but this is something that needs to be revisited. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 4.07.2018 19:55, David Sterba wrote: > On Mon, Jul 02, 2018 at 10:49:46AM +0300, Nikolay Borisov wrote: >> On 2.07.2018 10:44, Ethan Lien wrote: >>> We use customized, nodesize batch value to update dirty_metadata_bytes. >>> We should also use batch version of compare function or we will easily >>> goto fast path and get false result from percpu_counter_compare(). >>> >>> Signed-off-by: Ethan Lien <ethanlien@synology.com> >> >> Reviewed-by: Nikolay Borisov <nborisov@suse.com> >> >> Also this needs the following tag: >> >> Fixes: e2d845211eda ("Btrfs: use percpu counter for dirty metadata count") > > I'll add the tags, thanks. > > 2939 fs_info->dirty_metadata_batch = nodesize * (1 + ilog2(nr_cpu_ids)); > > What a commit, sigh. Arbitrary constants without any explanation. I > don't quite understand why the ilog2(nr_cpui_ids) was chosen. If cpus > are onlined/offlined there's no adjustment. I'd think that there should > be some amount of data that are supposed to be dirtied per-cpu and this > value is independent of the number of cpus. That the online/offline > status is abstracted by the percpu API. I don't have time to study that > too deeply, but this is something that needs to be revisited. TBH, what is perhaps the safest approach would be to modify the code to eliminate the custom batch related variables/code. The code has been working like that for years. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 205092dc9390..dfed08e70ec1 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -961,8 +961,9 @@ static int btree_writepages(struct address_space *mapping, fs_info = BTRFS_I(mapping->host)->root->fs_info; /* this is a bit racy, but that's ok */ - ret = percpu_counter_compare(&fs_info->dirty_metadata_bytes, - BTRFS_DIRTY_METADATA_THRESH); + ret = __percpu_counter_compare(&fs_info->dirty_metadata_bytes, + BTRFS_DIRTY_METADATA_THRESH, + fs_info->dirty_metadata_batch); if (ret < 0) return 0; } @@ -4150,8 +4151,9 @@ static void __btrfs_btree_balance_dirty(struct btrfs_fs_info *fs_info, if (flush_delayed) btrfs_balance_delayed_items(fs_info); - ret = percpu_counter_compare(&fs_info->dirty_metadata_bytes, - BTRFS_DIRTY_METADATA_THRESH); + ret = __percpu_counter_compare(&fs_info->dirty_metadata_bytes, + BTRFS_DIRTY_METADATA_THRESH, + fs_info->dirty_metadata_batch); if (ret > 0) { balance_dirty_pages_ratelimited(fs_info->btree_inode->i_mapping); }
We use customized, nodesize batch value to update dirty_metadata_bytes. We should also use batch version of compare function or we will easily goto fast path and get false result from percpu_counter_compare(). Signed-off-by: Ethan Lien <ethanlien@synology.com> --- fs/btrfs/disk-io.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)