diff mbox series

[v4,5/8] p5303: measure time to repack with keep

Message ID 94e4f3ee3af3181c3805ee397d043e343038005a.1614047097.git.me@ttaylorr.com (mailing list archive)
State New, archived
Headers show
Series repack: support repacking into a geometric sequence | expand

Commit Message

Taylor Blau Feb. 23, 2021, 2:25 a.m. UTC
From: Jeff King <peff@peff.net>

Add two new tests to measure repack performance. Both tests split the
repository into synthetic "pushes", and then leave the remaining objects
in a big base pack.

The first new test marks an empty pack as "kept" and then passes
--honor-pack-keep to avoid including objects in it. That doesn't change
the resulting pack, but it does let us compare to the normal repack case
to see how much overhead we add to check whether objects are kept or
not.

The other test is of --stdin-packs, which gives us a sense of how that
number scales based on the number of packs we provide as input. In each
of those tests, the empty pack isn't considered, but the residual pack
(objects that were left over and not included in one of the synthetic
push packs) is marked as kept.

(Note that in the single-pack case of the --stdin-packs test, there is
nothing do since there are no non-excluded packs).

Here are some timings on a recent clone of the kernel:

  5303.5: repack (1)                          57.26(54.59+10.84)
  5303.6: repack with kept (1)                57.33(54.80+10.51)

in the 50-pack case, things start to slow down:

  5303.11: repack (50)                        71.54(88.57+4.84)
  5303.12: repack with kept (50)              85.12(102.05+4.94)

and by the time we hit 1,000 packs, things are substantially worse, even
though the resulting pack produced is the same:

  5303.17: repack (1000)                      216.87(490.79+14.57)
  5303.18: repack with kept (1000)            665.63(938.87+15.76)

That's because the code paths around handling .keep files are known to
scale badly; they look in every single pack file to find each object.
Our solution to that was to notice that most repos don't have keep
files, and to make that case a fast path. But as soon as you add a
single .keep, that part of pack-objects slows down again (even if we
have fewer objects total to look at).

Likewise, the scaling is pretty extreme on --stdin-packs (but each
subsequent test is also being asked to do more work):

  5303.7: repack with --stdin-packs (1)       0.01(0.01+0.00)
  5303.13: repack with --stdin-packs (50)     3.53(12.07+0.24)
  5303.19: repack with --stdin-packs (1000)   195.83(371.82+8.10)

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/p5303-many-packs.sh | 34 ++++++++++++++++++++++++++++++++--
 1 file changed, 32 insertions(+), 2 deletions(-)
diff mbox series

Patch

diff --git a/t/perf/p5303-many-packs.sh b/t/perf/p5303-many-packs.sh
index d90d714923..35c0cbdf49 100755
--- a/t/perf/p5303-many-packs.sh
+++ b/t/perf/p5303-many-packs.sh
@@ -31,8 +31,15 @@  repack_into_n () {
 	' "$1" >pushes &&
 
 	# create base packfile
-	head -n 1 pushes |
-	git pack-objects --delta-base-offset --revs staging/pack &&
+	base_pack=$(
+		head -n 1 pushes |
+		git pack-objects --delta-base-offset --revs staging/pack
+	) &&
+	test_export base_pack &&
+
+	# create an empty packfile
+	empty_pack=$(git pack-objects staging/pack </dev/null) &&
+	test_export empty_pack &&
 
 	# and then incrementals between each pair of commits
 	last= &&
@@ -49,6 +56,12 @@  repack_into_n () {
 		last=$rev
 	done <pushes &&
 
+	(
+		find staging -type f -name 'pack-*.pack' |
+			xargs -n 1 basename | grep -v "$base_pack" &&
+		printf "^pack-%s.pack\n" $base_pack
+	) >stdin.packs
+
 	# and install the whole thing
 	rm -f .git/objects/pack/* &&
 	mv staging/* .git/objects/pack/
@@ -91,6 +104,23 @@  do
 		  --reflog --indexed-objects --delta-base-offset \
 		  --stdout </dev/null >/dev/null
 	'
+
+	test_perf "repack with kept ($nr_packs)" '
+		git pack-objects --keep-true-parents \
+		  --keep-pack=pack-$empty_pack.pack \
+		  --honor-pack-keep --non-empty --all \
+		  --reflog --indexed-objects --delta-base-offset \
+		  --stdout </dev/null >/dev/null
+	'
+
+	test_perf "repack with --stdin-packs ($nr_packs)" '
+		git pack-objects \
+		  --keep-true-parents \
+		  --stdin-packs \
+		  --non-empty \
+		  --delta-base-offset \
+		  --stdout <stdin.packs >/dev/null
+	'
 done
 
 # Measure pack loading with 10,000 packs.