Message ID | 999b1d094241b0ba8d6924ac6976eafc64c7d4a6.1726692382.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | pack-objects: create new name-hash algorithm | expand |
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Derrick Stolee <stolee@gmail.com> > > As custom options are added to 'git pack-objects' and 'git repack' to > adjust how compression is done, use this new performance test script to > demonstrate their effectiveness in performance and size. > > The recently-added --full-name-hash option swaps the default name-hash > algorithm with one that attempts to uniformly distribute the hashes > based on the full path name instead of the last 16 characters. > > This has a dramatic effect on full repacks for repositories with many > versions of most paths. It can have a negative impact on cases such as > pushing a single change. > > This can be seen by running pt5313 on the open source fluentui > repository [1]. Most commits will have this kind of output for the thin > and big pack cases, though certain commits (such as [2]) will have > problematic thin pack size for other reasons. > > [1] https://github.com/microsoft/fluentui > [2] a637a06df05360ce5ff21420803f64608226a875 > > Checked out at the parent of [2], I see the following statistics: > > Test this tree > ------------------------------------------------------------------ > 5313.2: thin pack 0.02(0.01+0.01) > 5313.3: thin pack size 1.1K > 5313.4: thin pack with --full-name-hash 0.02(0.01+0.00) > 5313.5: thin pack size with --full-name-hash 3.0K > 5313.6: big pack 1.65(3.35+0.24) > 5313.7: big pack size 58.0M > 5313.8: big pack with --full-name-hash 1.53(2.52+0.18) > 5313.9: big pack size with --full-name-hash 57.6M > 5313.10: repack 176.52(706.60+3.53) > 5313.11: repack size 446.7K > 5313.12: repack with --full-name-hash 37.47(134.18+3.06) > 5313.13: repack size with --full-name-hash 183.1K > > Note that this demonstrates a 3x size _increase_ in the case that > simulates a small "git push". The size change is neutral on the case of > pushing the difference between HEAD and HEAD~1000. > > However, the full repack case is both faster and more efficient. Nice. > Signed-off-by: Derrick Stolee <stolee@gmail.com> > --- > t/perf/p5313-pack-objects.sh | 71 ++++++++++++++++++++++++++++++++++++ > 1 file changed, 71 insertions(+) > create mode 100755 t/perf/p5313-pack-objects.sh "wc -c" -> "test_file_size" or "test-tool path-utils file-size"?
On 9/19/24 5:58 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: >> t/perf/p5313-pack-objects.sh | 71 ++++++++++++++++++++++++++++++++++++ >> 1 file changed, 71 insertions(+) >> create mode 100755 t/perf/p5313-pack-objects.sh > > "wc -c" -> "test_file_size" or "test-tool path-utils file-size"? Thanks. I was unfamiliar with those options. I will squash this change into the next version. The use of a variable for the packfile name was important at some point while I was testing this with various repos on different platforms. --- >8 --- diff --git a/t/perf/p5313-pack-objects.sh b/t/perf/p5313-pack-objects.sh index 5dcf52acb0..bf6f0d69e4 100755 --- a/t/perf/p5313-pack-objects.sh +++ b/t/perf/p5313-pack-objects.sh @@ -25,7 +25,7 @@ test_perf 'thin pack' ' ' test_size 'thin pack size' ' - wc -c <out + test_file_size out ' test_perf 'thin pack with --full-name-hash' ' @@ -33,7 +33,7 @@ test_perf 'thin pack with --full-name-hash' ' ' test_size 'thin pack size with --full-name-hash' ' - wc -c <out + test_file_size out ' test_perf 'big pack' ' @@ -41,7 +41,7 @@ test_perf 'big pack' ' ' test_size 'big pack size' ' - wc -c <out + test_file_size out ' test_perf 'big pack with --full-name-hash' ' @@ -49,7 +49,7 @@ test_perf 'big pack with --full-name-hash' ' ' test_size 'big pack size with --full-name-hash' ' - wc -c <out + test_file_size out ' test_perf 'repack' ' @@ -57,7 +57,8 @@ test_perf 'repack' ' ' test_size 'repack size' ' - wc -c <.git/objects/pack/pack-*.pack + pack=$(ls .git/objects/pack/pack-*.pack) && + test_file_size "$pack" ' test_perf 'repack with --full-name-hash' ' @@ -65,7 +66,8 @@ test_perf 'repack with --full-name-hash' ' ' test_size 'repack size with --full-name-hash' ' - wc -c <.git/objects/pack/pack-*.pack + pack=$(ls .git/objects/pack/pack-*.pack) && + test_file_size "$pack" ' test_done
Derrick Stolee <stolee@gmail.com> writes: > On 9/19/24 5:58 PM, Junio C Hamano wrote: >> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >>> t/perf/p5313-pack-objects.sh | 71 ++++++++++++++++++++++++++++++++++++ >>> 1 file changed, 71 insertions(+) >>> create mode 100755 t/perf/p5313-pack-objects.sh >> "wc -c" -> "test_file_size" or "test-tool path-utils file-size"? > > Thanks. I was unfamiliar with those options. I was, too ;-). There are a few other test pieces that use "wc -c" to count bytes, but it made me a bit nervous to use it to count bytes in a binary file. > test_size 'repack size' ' > - wc -c <.git/objects/pack/pack-*.pack > + pack=$(ls .git/objects/pack/pack-*.pack) && > + test_file_size "$pack" > ' I am perfectly fine (and actually even prefer) to pay the cost to fork "ls" here, but if you really wanted to avoid it, we could do something like pack=$( set x .git/objects/pack/pack-*.pack && test $# = 2 && echo "$2" || exit 1 ) && test_file_size "$pack" ;-) > test_perf 'repack with --full-name-hash' ' > @@ -65,7 +66,8 @@ test_perf 'repack with --full-name-hash' ' > ' > > test_size 'repack size with --full-name-hash' ' > - wc -c <.git/objects/pack/pack-*.pack > + pack=$(ls .git/objects/pack/pack-*.pack) && > + test_file_size "$pack" > ' > > test_done
diff --git a/t/perf/p5313-pack-objects.sh b/t/perf/p5313-pack-objects.sh new file mode 100755 index 00000000000..5dcf52acb0d --- /dev/null +++ b/t/perf/p5313-pack-objects.sh @@ -0,0 +1,71 @@ +#!/bin/sh + +test_description='Tests pack performance using bitmaps' +. ./perf-lib.sh + +GIT_TEST_PASSING_SANITIZE_LEAK=0 +export GIT_TEST_PASSING_SANITIZE_LEAK + +test_perf_large_repo + +test_expect_success 'create rev input' ' + cat >in-thin <<-EOF && + $(git rev-parse HEAD) + ^$(git rev-parse HEAD~1) + EOF + + cat >in-big <<-EOF + $(git rev-parse HEAD) + ^$(git rev-parse HEAD~1000) + EOF +' + +test_perf 'thin pack' ' + git pack-objects --thin --stdout --revs --sparse <in-thin >out +' + +test_size 'thin pack size' ' + wc -c <out +' + +test_perf 'thin pack with --full-name-hash' ' + git pack-objects --thin --stdout --revs --sparse --full-name-hash <in-thin >out +' + +test_size 'thin pack size with --full-name-hash' ' + wc -c <out +' + +test_perf 'big pack' ' + git pack-objects --stdout --revs --sparse <in-big >out +' + +test_size 'big pack size' ' + wc -c <out +' + +test_perf 'big pack with --full-name-hash' ' + git pack-objects --stdout --revs --sparse --full-name-hash <in-big >out +' + +test_size 'big pack size with --full-name-hash' ' + wc -c <out +' + +test_perf 'repack' ' + git repack -adf +' + +test_size 'repack size' ' + wc -c <.git/objects/pack/pack-*.pack +' + +test_perf 'repack with --full-name-hash' ' + git repack -adf --full-name-hash +' + +test_size 'repack size with --full-name-hash' ' + wc -c <.git/objects/pack/pack-*.pack +' + +test_done