diff mbox

[v7,4/6] Btrfs: heuristic add detection of repeated data patterns

Message ID 20170825091845.4120-5-nefelim4ag@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Timofey Titovets Aug. 25, 2017, 9:18 a.m. UTC
Walk over data sample and use memcmp to detect
repeated data (like zeroed)

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/heuristic.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

--
2.14.1
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Sterba Sept. 27, 2017, 1:47 p.m. UTC | #1
On Fri, Aug 25, 2017 at 12:18:43PM +0300, Timofey Titovets wrote:
> Walk over data sample and use memcmp to detect
> repeated data (like zeroed)
> 
> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
> ---
>  fs/btrfs/heuristic.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
> index 5192e51ab81e..f1fa6e4f1c11 100644
> --- a/fs/btrfs/heuristic.c
> +++ b/fs/btrfs/heuristic.c
> @@ -66,6 +66,19 @@ static struct list_head *heuristic_alloc_workspace(void)
>  	return ERR_PTR(-ENOMEM);
>  }
> 
> +static bool sample_repeated_patterns(struct workspace *ws)
> +{
> +	u32 i = 0;
> +	u8 *p = ws->sample;
> +
> +	for (; i < ws->sample_size - READ_SIZE; i += READ_SIZE) {

Please use the inline initializatio of the iteration variables.

	for (i = 0; i < ws->sample_size - READ_SIZE; i += READ_SIZE) {

> +		if(memcpy(&p[i], &p[i + READ_SIZE], READ_SIZE))
> +			return false;

I think zeros are by far the most common repeated pattern, for that we
can use some speedy tests, built on top of "find first bit", possibly
with hardware support.

OTOH if it's a more generic "any repeated bytes", this might be actually
slower, and the heuristic needs to be as fast as possible. I don't want
to start optimizing now, so the generic pattern is ok for now.

> +	}
> +
> +	return true;
> +}
> +
>  static int heuristic(struct list_head *ws, struct inode *inode,
>  		     u64 start, u64 end)
>  {
> @@ -115,6 +128,9 @@ static int heuristic(struct list_head *ws, struct inode *inode,
> 
>  	workspace->sample_size = b;
> 
> +	if (sample_repeated_patterns(workspace))
> +		return 1;
> +
>  	memset(workspace->bucket, 0, sizeof(*workspace->bucket)*BUCKET_SIZE);
> 
>  	for (a = 0; a < workspace->sample_size; a++) {
> --
> 2.14.1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/heuristic.c b/fs/btrfs/heuristic.c
index 5192e51ab81e..f1fa6e4f1c11 100644
--- a/fs/btrfs/heuristic.c
+++ b/fs/btrfs/heuristic.c
@@ -66,6 +66,19 @@  static struct list_head *heuristic_alloc_workspace(void)
 	return ERR_PTR(-ENOMEM);
 }

+static bool sample_repeated_patterns(struct workspace *ws)
+{
+	u32 i = 0;
+	u8 *p = ws->sample;
+
+	for (; i < ws->sample_size - READ_SIZE; i += READ_SIZE) {
+		if(memcpy(&p[i], &p[i + READ_SIZE], READ_SIZE))
+			return false;
+	}
+
+	return true;
+}
+
 static int heuristic(struct list_head *ws, struct inode *inode,
 		     u64 start, u64 end)
 {
@@ -115,6 +128,9 @@  static int heuristic(struct list_head *ws, struct inode *inode,

 	workspace->sample_size = b;

+	if (sample_repeated_patterns(workspace))
+		return 1;
+
 	memset(workspace->bucket, 0, sizeof(*workspace->bucket)*BUCKET_SIZE);

 	for (a = 0; a < workspace->sample_size; a++) {