diff mbox series

[v2,5/6] p5313: add size comparison test

Message ID 999b1d094241b0ba8d6924ac6976eafc64c7d4a6.1726692382.git.gitgitgadget@gmail.com (mailing list archive)
State New
Headers show
Series pack-objects: create new name-hash algorithm | expand

Commit Message

Derrick Stolee Sept. 18, 2024, 8:46 p.m. UTC
From: Derrick Stolee <stolee@gmail.com>

As custom options are added to 'git pack-objects' and 'git repack' to
adjust how compression is done, use this new performance test script to
demonstrate their effectiveness in performance and size.

The recently-added --full-name-hash option swaps the default name-hash
algorithm with one that attempts to uniformly distribute the hashes
based on the full path name instead of the last 16 characters.

This has a dramatic effect on full repacks for repositories with many
versions of most paths. It can have a negative impact on cases such as
pushing a single change.

This can be seen by running pt5313 on the open source fluentui
repository [1]. Most commits will have this kind of output for the thin
and big pack cases, though certain commits (such as [2]) will have
problematic thin pack size for other reasons.

[1] https://github.com/microsoft/fluentui
[2] a637a06df05360ce5ff21420803f64608226a875

Checked out at the parent of [2], I see the following statistics:

Test                                           this tree
------------------------------------------------------------------
5313.2: thin pack                              0.02(0.01+0.01)
5313.3: thin pack size                                    1.1K
5313.4: thin pack with --full-name-hash        0.02(0.01+0.00)
5313.5: thin pack size with --full-name-hash              3.0K
5313.6: big pack                               1.65(3.35+0.24)
5313.7: big pack size                                    58.0M
5313.8: big pack with --full-name-hash         1.53(2.52+0.18)
5313.9: big pack size with --full-name-hash              57.6M
5313.10: repack                                176.52(706.60+3.53)
5313.11: repack size                                    446.7K
5313.12: repack with --full-name-hash          37.47(134.18+3.06)
5313.13: repack size with --full-name-hash              183.1K

Note that this demonstrates a 3x size _increase_ in the case that
simulates a small "git push". The size change is neutral on the case of
pushing the difference between HEAD and HEAD~1000.

However, the full repack case is both faster and more efficient.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
 t/perf/p5313-pack-objects.sh | 71 ++++++++++++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)
 create mode 100755 t/perf/p5313-pack-objects.sh

Comments

Junio C Hamano Sept. 19, 2024, 9:58 p.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <stolee@gmail.com>
>
> As custom options are added to 'git pack-objects' and 'git repack' to
> adjust how compression is done, use this new performance test script to
> demonstrate their effectiveness in performance and size.
>
> The recently-added --full-name-hash option swaps the default name-hash
> algorithm with one that attempts to uniformly distribute the hashes
> based on the full path name instead of the last 16 characters.
>
> This has a dramatic effect on full repacks for repositories with many
> versions of most paths. It can have a negative impact on cases such as
> pushing a single change.
>
> This can be seen by running pt5313 on the open source fluentui
> repository [1]. Most commits will have this kind of output for the thin
> and big pack cases, though certain commits (such as [2]) will have
> problematic thin pack size for other reasons.
>
> [1] https://github.com/microsoft/fluentui
> [2] a637a06df05360ce5ff21420803f64608226a875
>
> Checked out at the parent of [2], I see the following statistics:
>
> Test                                           this tree
> ------------------------------------------------------------------
> 5313.2: thin pack                              0.02(0.01+0.01)
> 5313.3: thin pack size                                    1.1K
> 5313.4: thin pack with --full-name-hash        0.02(0.01+0.00)
> 5313.5: thin pack size with --full-name-hash              3.0K
> 5313.6: big pack                               1.65(3.35+0.24)
> 5313.7: big pack size                                    58.0M
> 5313.8: big pack with --full-name-hash         1.53(2.52+0.18)
> 5313.9: big pack size with --full-name-hash              57.6M
> 5313.10: repack                                176.52(706.60+3.53)
> 5313.11: repack size                                    446.7K
> 5313.12: repack with --full-name-hash          37.47(134.18+3.06)
> 5313.13: repack size with --full-name-hash              183.1K
>
> Note that this demonstrates a 3x size _increase_ in the case that
> simulates a small "git push". The size change is neutral on the case of
> pushing the difference between HEAD and HEAD~1000.
>
> However, the full repack case is both faster and more efficient.

Nice.

> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> ---
>  t/perf/p5313-pack-objects.sh | 71 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 71 insertions(+)
>  create mode 100755 t/perf/p5313-pack-objects.sh

"wc -c" -> "test_file_size" or "test-tool path-utils file-size"?
Derrick Stolee Sept. 23, 2024, 1:50 a.m. UTC | #2
On 9/19/24 5:58 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

>>   t/perf/p5313-pack-objects.sh | 71 ++++++++++++++++++++++++++++++++++++
>>   1 file changed, 71 insertions(+)
>>   create mode 100755 t/perf/p5313-pack-objects.sh
> 
> "wc -c" -> "test_file_size" or "test-tool path-utils file-size"?

Thanks. I was unfamiliar with those options.

I will squash this change into the next version. The use of a
variable for the packfile name was important at some point while
I was testing this with various repos on different platforms.

--- >8 ---

diff --git a/t/perf/p5313-pack-objects.sh b/t/perf/p5313-pack-objects.sh
index 5dcf52acb0..bf6f0d69e4 100755
--- a/t/perf/p5313-pack-objects.sh
+++ b/t/perf/p5313-pack-objects.sh
@@ -25,7 +25,7 @@ test_perf 'thin pack' '
  '

  test_size 'thin pack size' '
-	wc -c <out
+	test_file_size out
  '

  test_perf 'thin pack with --full-name-hash' '
@@ -33,7 +33,7 @@ test_perf 'thin pack with --full-name-hash' '
  '

  test_size 'thin pack size with --full-name-hash' '
-	wc -c <out
+	test_file_size out
  '

  test_perf 'big pack' '
@@ -41,7 +41,7 @@ test_perf 'big pack' '
  '

  test_size 'big pack size' '
-	wc -c <out
+	test_file_size out
  '

  test_perf 'big pack with --full-name-hash' '
@@ -49,7 +49,7 @@ test_perf 'big pack with --full-name-hash' '
  '

  test_size 'big pack size with --full-name-hash' '
-	wc -c <out
+	test_file_size out
  '

  test_perf 'repack' '
@@ -57,7 +57,8 @@ test_perf 'repack' '
  '

  test_size 'repack size' '
-	wc -c <.git/objects/pack/pack-*.pack
+	pack=$(ls .git/objects/pack/pack-*.pack) &&
+	test_file_size "$pack"
  '

  test_perf 'repack with --full-name-hash' '
@@ -65,7 +66,8 @@ test_perf 'repack with --full-name-hash' '
  '

  test_size 'repack size with --full-name-hash' '
-	wc -c <.git/objects/pack/pack-*.pack
+	pack=$(ls .git/objects/pack/pack-*.pack) &&
+	test_file_size "$pack"
  '

  test_done
Junio C Hamano Sept. 23, 2024, 4:14 p.m. UTC | #3
Derrick Stolee <stolee@gmail.com> writes:

> On 9/19/24 5:58 PM, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>>>   t/perf/p5313-pack-objects.sh | 71 ++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 71 insertions(+)
>>>   create mode 100755 t/perf/p5313-pack-objects.sh
>> "wc -c" -> "test_file_size" or "test-tool path-utils file-size"?
>
> Thanks. I was unfamiliar with those options.

I was, too ;-).  There are a few other test pieces that use "wc -c"
to count bytes, but it made me a bit nervous to use it to count
bytes in a binary file.

>  test_size 'repack size' '
> -	wc -c <.git/objects/pack/pack-*.pack
> +	pack=$(ls .git/objects/pack/pack-*.pack) &&
> +	test_file_size "$pack"
>  '

I am perfectly fine (and actually even prefer) to pay the cost to
fork "ls" here, but if you really wanted to avoid it, we could do
something like

	pack=$(
		set x .git/objects/pack/pack-*.pack &&
		test $# = 2 && echo "$2" || exit 1
	) &&
	test_file_size "$pack"

;-)

>  test_perf 'repack with --full-name-hash' '
> @@ -65,7 +66,8 @@ test_perf 'repack with --full-name-hash' '
>  '
>
>  test_size 'repack size with --full-name-hash' '
> -	wc -c <.git/objects/pack/pack-*.pack
> +	pack=$(ls .git/objects/pack/pack-*.pack) &&
> +	test_file_size "$pack"
>  '
>
>  test_done
diff mbox series

Patch

diff --git a/t/perf/p5313-pack-objects.sh b/t/perf/p5313-pack-objects.sh
new file mode 100755
index 00000000000..5dcf52acb0d
--- /dev/null
+++ b/t/perf/p5313-pack-objects.sh
@@ -0,0 +1,71 @@ 
+#!/bin/sh
+
+test_description='Tests pack performance using bitmaps'
+. ./perf-lib.sh
+
+GIT_TEST_PASSING_SANITIZE_LEAK=0
+export GIT_TEST_PASSING_SANITIZE_LEAK
+
+test_perf_large_repo
+
+test_expect_success 'create rev input' '
+	cat >in-thin <<-EOF &&
+	$(git rev-parse HEAD)
+	^$(git rev-parse HEAD~1)
+	EOF
+
+	cat >in-big <<-EOF
+	$(git rev-parse HEAD)
+	^$(git rev-parse HEAD~1000)
+	EOF
+'
+
+test_perf 'thin pack' '
+	git pack-objects --thin --stdout --revs --sparse  <in-thin >out
+'
+
+test_size 'thin pack size' '
+	wc -c <out
+'
+
+test_perf 'thin pack with --full-name-hash' '
+	git pack-objects --thin --stdout --revs --sparse --full-name-hash <in-thin >out
+'
+
+test_size 'thin pack size with --full-name-hash' '
+	wc -c <out
+'
+
+test_perf 'big pack' '
+	git pack-objects --stdout --revs --sparse  <in-big >out
+'
+
+test_size 'big pack size' '
+	wc -c <out
+'
+
+test_perf 'big pack with --full-name-hash' '
+	git pack-objects --stdout --revs --sparse --full-name-hash <in-big >out
+'
+
+test_size 'big pack size with --full-name-hash' '
+	wc -c <out
+'
+
+test_perf 'repack' '
+	git repack -adf
+'
+
+test_size 'repack size' '
+	wc -c <.git/objects/pack/pack-*.pack
+'
+
+test_perf 'repack with --full-name-hash' '
+	git repack -adf --full-name-hash
+'
+
+test_size 'repack size with --full-name-hash' '
+	wc -c <.git/objects/pack/pack-*.pack
+'
+
+test_done