diff mbox

[v8,0/6] Btrfs: populate heuristic with code

Message ID CAGqmi77Tvuf-XZP7ObgQBhGox++Vzm1P4ELHJYNDbeVF4OGq9A@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Timofey Titovets Oct. 23, 2017, 6:36 p.m. UTC
2017-10-22 16:44 GMT+03:00 Timofey Titovets <nefelim4ag@gmail.com>:
> 2017-10-20 16:45 GMT+03:00 David Sterba <dsterba@suse.cz>:
>> On Fri, Oct 20, 2017 at 01:48:01AM +0300, Timofey Titovets wrote:
>>> 2017-10-19 18:39 GMT+03:00 David Sterba <dsterba@suse.cz>:
>>> > On Fri, Sep 29, 2017 at 06:22:00PM +0200, David Sterba wrote:
>>> >> On Thu, Sep 28, 2017 at 05:33:35PM +0300, Timofey Titovets wrote:
>>> >> > Compile tested, hand tested on live system
>>> >> >
>>> >> > Change v7 -> v8
>>> >> >   - All code moved to compression.c (again)
>>> >> >   - Heuristic workspaces inmplemented another way
>>> >> >     i.e. only share logic with compression workspaces
>>> >> >   - Some style fixes suggested by Devid
>>> >> >   - Move sampling function from heuristic code
>>> >> >     (I'm afraid of big functions)
>>> >> >   - Much more comments and explanations
>>> >>
>>> >> Thanks for the update, I went through the patches and they looked good
>>> >> enough to be put into for-next. I may have more comments about a few
>>> >> things, but nothing serious that would hinder testing.
>>> >
>>> > I did a final pass through the patches and edited comments wehre I was
>>> > not able to undrerstand them. Please check the updated patches in [1] if
>>> > I did not accidentally change the meaning.
>>>
>>> I don't see a link [1] in mail, may be you missed it?
>>
>> Yeah, sorry:
>> https://github.com/kdave/btrfs-devel/commits/ext/timofey/heuristic
>
> I did re-read updated comments, looks ok to me
> (i only found one typo, leave a comment).
>
>
> Thanks
> --
> Have a nice day,
> Timofey.

Can you please try that patch? (in attach)

I think some time about performance hit of heuristic and
how to avoid using sorting,

That patch will try prefind min/max values (before sorting) in array,
and (max - min), used to filter edge data cases where
byte core size < 64 or bigger > 200
It's a bit hacky workaround =\,
That show a ~same speedup on my data set as show using of radix sort.
(i.e. x2 speed up)

Thanks.
diff mbox

Patch

From fb2a329828e64ad0e224a8cb97dbc17147149629 Mon Sep 17 00:00:00 2001
From: Timofey Titovets <nefelim4ag@gmail.com>
Date: Mon, 23 Oct 2017 21:24:29 +0300
Subject: [PATCH] Btrfs: heuristic try avoid bucket sorting on edge data cases

Heap sort used in kernel are too slow and costly,
So let's make some statistic assume about egde input data cases
Based on observation of difference between min/max values in bucket.

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/compression.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 0ca16909894e..56b67ec4fb5b 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -1310,8 +1310,46 @@  static int byte_core_set_size(struct heuristic_ws *ws)
 	u32 i;
 	u32 coreset_sum = 0;
 	const u32 core_set_threshold = ws->sample_size * 90 / 100;
+	struct bucket_item *max, *min;
+	struct bucket_item tmp;
 	struct bucket_item *bucket = ws->bucket;
 
+
+	/* Presort for find min/max value */
+	max = &bucket[0];
+	min = &bucket[BUCKET_SIZE - 1];
+	for (i = 1; i < BUCKET_SIZE - 1; i++) {
+		if (bucket[i].count > max->count) {
+			tmp = *max;
+			*max = bucket[i];
+			bucket[i] = tmp;
+		}
+		if (bucket[i].count < min->count) {
+			tmp = *min;
+			*min = bucket[i];
+			bucket[i] = tmp;
+		}
+	}
+
+	/*
+	 * Hacks for avoid sorting on Edge data cases (sorting too constly)
+	 * i.e. that will fast filter easy compressible
+	 * and bad compressible data
+	 * Based on observation of number distribution on different data sets
+	 *
+	 * Assume 1: For bad compressible data distribution between min/max
+	 * will be less then 0.6% of sample size
+	 *
+	 * Assume 2: For good compressible data distribution between min/max
+	 * will be far bigger then 4% of sample size
+	 */
+
+	if (max->count - min->count < ws->sample_size * 6 / 1000)
+		return BYTE_CORE_SET_HIGH + 1;
+
+	if (max->count - min->count > ws->sample_size * 4 / 100)
+		return BYTE_CORE_SET_LOW - 1;
+
 	/* Sort in reverse order */
 	sort(bucket, BUCKET_SIZE, sizeof(*bucket), &bucket_comp_rev, NULL);
 
-- 
2.14.2