[5/9] test-mergesort: add unriffle mode

Message ID	fbeacc0e-d870-2aef-3177-9b661ffc6768@web.de (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> Subject: [PATCH 5/9] test-mergesort: add unriffle mode From: =?utf-8?q?Ren=C3=A9_Scharfe?= <l.s.r@web.de> To: Git List <git@vger.kernel.org> Cc: Junio C Hamano <gitster@pobox.com> References: <943b1e01-465e-5def-a766-0adf667690de@web.de> Message-ID: <fbeacc0e-d870-2aef-3177-9b661ffc6768@web.de> Date: Fri, 1 Oct 2021 11:16:49 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <943b1e01-465e-5def-a766-0adf667690de@web.de> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable Precedence: bulk
Series	mergesort: improve tests and performance \| expand [0/9] mergesort: improve tests and performance [1/9] test-mergesort: use strbuf_getline() [2/9] test-mergesort: add sort subcommand [3/9] test-mergesort: add test subcommand [4/9] test-mergesort: add generate subcommand [5/9] test-mergesort: add unriffle mode [6/9] test-mergesort: add unriffle_skewed mode [7/9] p0071: measure sorting of already sorted and reversed files [8/9] p0071: test performance of llist_mergesort() [9/9] mergesort: use ranks stack [10/9,v2] test-mergesort: use repeatable random numbers

Message ID

fbeacc0e-d870-2aef-3177-9b661ffc6768@web.de (mailing list archive)

State

New, archived

Headers

Subject: [PATCH 5/9] test-mergesort: add unriffle mode
From: =?utf-8?q?Ren=C3=A9_Scharfe?= <l.s.r@web.de>
To: Git List <git@vger.kernel.org>
Cc: Junio C Hamano <gitster@pobox.com>
References: <943b1e01-465e-5def-a766-0adf667690de@web.de>
Message-ID: <fbeacc0e-d870-2aef-3177-9b661ffc6768@web.de>
Date: Fri, 1 Oct 2021 11:16:49 +0200
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0)
 Gecko/20100101 Thunderbird/78.14.0
MIME-Version: 1.0
In-Reply-To: <943b1e01-465e-5def-a766-0adf667690de@web.de>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: quoted-printable
Precedence: bulk

Series

mergesort: improve tests and performance | expand

Commit Message

René Scharfe Oct. 1, 2021, 9:16 a.m. UTC

Add a mode that turns sorted items into adversarial input for mergesort.
Do that by running mergesort in reverse and rearranging the items in
such a way that each merge needs the maximum number of operations to
undo it.

To riffle is a card shuffling technique and involves splitting a deck
into two and then to interleave them.  A perfect riffle takes one card
from each half in turn.  That's similar to the most expensive merge,
which has to take one item from each sublist in turn, which requires the
maximum number of comparisons (n-1).

So unriffle does that in reverse, i.e. it generates the first sublist
out of the items at even indexes and the second sublist out of the items
at odd indexes, without changing their order in any other way.  Done
recursively until we reach the trivial sublist length of one, this
twists the list into an order that requires the maximum effort for
mergesort to untangle.

As a baseline, here are the rand distributions with the highest number
of comparisons from "test-tool mergesort test":

   $ t/helper/test-tool mergesort test | awk '
      NR > 1 && $1 != "rand" {next}
      $7 > max[$3] {max[$3] = $7; line[$3] = $0}
      END {for (n in line) print line[n]}
   '

distribut mode                    n        m get_next set_next  compare verdict
rand      copy                  100       32     1184      700      569 OK
rand      reverse_1st_half     1023      256    16373    10230     8976 OK
rand      reverse_1st_half     1024      512    16384    10240     8993 OK
rand      dither               1025       64    18454    11275     9970 OK

And here are the most expensive ones overall:

   $ t/helper/test-tool mergesort test | awk '
      $7 > max[$3] {max[$3] = $7; line[$3] = $0}
      END {for (n in line) print line[n]}
   '

distribut mode                    n        m get_next set_next  compare verdict
stagger   reverse               100       64     1184      700      580 OK
sawtooth  unriffle             1023     1024    16373    10230     9179 OK
sawtooth  unriffle             1024     1024    16384    10240     9217 OK
stagger   unriffle             1025     2048    18454    11275    10241 OK

The sawtooth distribution with m>=n generates a sorted list.  The
unriffle mode is designed to turn that into adversarial input for
mergesort, and that checks out for n=1023 and n=1024, where it produces
the list that requires the most comparisons.

Item counts that are not powers of two have other winners, and that's
because unriffle recursively splits lists into equal-sized halves, while
llist_mergesort() splits them into the biggest power of two smaller than
n and the rest, e.g. for n=1025 it sorts the first 1024 separately and
finally merges them to the last item.

So unriffle mode works as designed for the intended use case, but to
consistently generate adversarial input for unbalanced merges we need
something else.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 t/helper/test-mergesort.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

--
2.33.0

diff --git a/t/helper/test-mergesort.c b/t/helper/test-mergesort.c
index 27ba252d4a..d71ef568f3 100644
--- a/t/helper/test-mergesort.c
+++ b/t/helper/test-mergesort.c
@@ -150,6 +150,34 @@  static void mode_dither(int *arr, int n)
 		arr[i] += i % 5;
 }

+static void unriffle(int *arr, int n, int *tmp)
+{
+	int i, j;
+	COPY_ARRAY(tmp, arr, n);
+	for (i = j = 0; i < n; i += 2)
+		arr[j++] = tmp[i];
+	for (i = 1; i < n; i += 2)
+		arr[j++] = tmp[i];
+}
+
+static void unriffle_recursively(int *arr, int n, int *tmp)
+{
+	if (n > 1) {
+		int half = n / 2;
+		unriffle(arr, n, tmp);
+		unriffle_recursively(arr, half, tmp);
+		unriffle_recursively(arr + half, n - half, tmp);
+	}
+}
+
+static void mode_unriffle(int *arr, int n)
+{
+	int *tmp;
+	ALLOC_ARRAY(tmp, n);
+	unriffle_recursively(arr, n, tmp);
+	free(tmp);
+}
+
 #define MODE(name) { #name, mode_##name }

 static struct mode {
@@ -162,6 +190,7 @@  static struct mode {
 	MODE(reverse_2nd_half),
 	MODE(sort),
 	MODE(dither),
+	MODE(unriffle),
 };

 static const struct mode *get_mode_by_name(const char *name)

[5/9] test-mergesort: add unriffle mode

Commit Message

Patch