@@ -119,6 +119,10 @@ sizes (e.g., removable media that cannot store the whole repository),
you are likely better off creating a single large packfile and splitting
it using a generic multi-volume archive tool (e.g., Unix `split`).
+
+When generating cruft packs with `git pack-objects`, this option has a
+slightly different interpretation than above; see the documentation for
+`--max-pack-size` option in linkgit:git-pack-objects[1].
++
The minimum size allowed is limited to 1 MiB. The default is unlimited.
Common unit suffixes of 'k', 'm', or 'g' are supported.
@@ -161,6 +161,10 @@ depth is 4095.
`pack.packSizeLimit` is set. Note that this option may result in
a larger and slower repository; see the discussion in
`pack.packSizeLimit`.
++
+When used with `--cruft`, the output packfile(s) may be as large or
+larger than the configured maximum size. The pack will exceed the
+specified maximum by no more than one object.
--honor-pack-keep::
This flag causes an object already in a local pack that
@@ -692,11 +692,39 @@ static off_t write_object(struct hashfile *f,
off_t len;
int usable_delta, to_reuse;
+ if (cruft && pack_size_limit && pack_size_limit <= write_offset) {
+ /*
+ * When writing a cruft pack with a limited size,
+ * perform the --max-pack-size check *before* writing
+ * the object.
+ *
+ * When we have not yet reached the size limit, this
+ * combined with the fact that we act as if there is no
+ * limit when writing objects via write_object() allows
+ * us to grow one object *past* the specified limit.
+ *
+ * This is important for generating cruft packs with a
+ * --max-pack-size so we can generate packs that are
+ * just over the threshold to avoid repacking them in
+ * the future.
+ */
+ return 0;
+ }
+
if (!pack_to_stdout)
crc32_begin(f);
- /* apply size limit if limited packsize and not first object */
- if (!pack_size_limit || !nr_written)
+ /*
+ * Apply size limit when one is provided, with the following
+ * exceptions:
+ *
+ * - We are writing the first object.
+ *
+ * - We are writing a cruft pack with a size limit. The check
+ * above covers this case while letting the pack grow at most
+ * one object beyond the limit.
+ */
+ if (!pack_size_limit || !nr_written || cruft)
limit = 0;
else if (pack_size_limit <= write_offset)
/*
@@ -3,6 +3,7 @@
test_description='cruft pack related pack-objects tests'
. ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-cruft.sh
objdir=.git/objects
packdir=$objdir/pack
@@ -695,4 +696,70 @@ test_expect_success 'additional cruft blobs via gc.recentObjectsHook' '
)
'
+test_expect_success 'cruft pack generation beyond --max-pack-size' '
+ test_when_finished "rm -fr repo" &&
+ git init repo &&
+ (
+ cd repo &&
+
+ # Disable pack compression to ensure the pack size is
+ # predictable.
+ git config pack.compression 0 &&
+
+ sz=524288 && # 0.5 MiB
+ foo="$(generate_random_blob foo $sz)" &&
+ bar="$(generate_random_blob bar $sz)" &&
+ baz="$(generate_random_blob baz $sz)" &&
+ quux="$(generate_random_blob quux $sz)" &&
+
+ printf "%s\n" "$foo" "$bar" >A.objects &&
+ printf "%s\n" "$baz" "$quux" >B.objects &&
+
+ A="$(git pack-objects $packdir/pack <A.objects)" &&
+ B="$(git pack-objects $packdir/pack <B.objects)" &&
+
+ git prune-packed &&
+
+ sz=1572864 && # 1.5 MiB
+ printf -- "-%s\n" "pack-$A.pack" "pack-$B.pack" >C.in &&
+ git pack-objects --cruft --max-pack-size=$sz $packdir/pack \
+ <C.in >C.out &&
+
+ test_line_count = 2 C.out &&
+ C_large="$(head -n 1 C.out)" &&
+ C_small="$(tail -n 1 C.out)" &&
+
+ # Swap $C_large and $C_small if necessary.
+ if test "$(test_file_size $packdir/pack-$C_large.idx)" -lt \
+ "$(test_file_size $packdir/pack-$C_small.idx)"
+ then
+ tmp="$C_large" &&
+ C_large="$C_small" &&
+ C_small="$tmp"
+ fi &&
+
+ # Ensure the large pack is no smaller than the threshold
+ # such that it does not get repacked in subsequent runs
+ # with the same --max-pack-size setting.
+ test $(test_file_size $packdir/pack-$C_large.pack) -ge $sz &&
+
+ {
+ git show-index <"$packdir/pack-$C_large.idx" &&
+ git show-index <"$packdir/pack-$C_small.idx"
+ } >actual.raw &&
+ printf "%s\n" "$foo" "$bar" "$baz" "$quux" >expect.raw &&
+
+ sort <expect.raw >expect &&
+ cut -d " " -f 2 actual.raw | sort >actual &&
+
+ # Ensure that all of the objects are present in the two
+ # cruft packs we just generated.
+ #
+ # Note that the contents of "actual" are not
+ # de-duplicated. This is intentional to ensure we avoid
+ # packing the same object twice (once in each pack).
+ test_cmp expect actual
+ )
+'
+
test_done
When generating multiple cruft packs with 'git repack --max-cruft-size', we use 'git pack-objects --cruft --max-pack-size' (with many other elided options), filling in the '--max-pack-size' value with whatever was provided via the '--max-cruft-size' flag. This causes us to generate a pack that is smaller than the specified threshold. This poses a problem since we will never be able to generate a cruft pack that crosses the threshold. In effect, this means that we will try and repack its contents over and over again. Instead, change the meaning of '--max-pack-size' in pack-objects when combined with '--cruft'. When put together, '--max-pack-size' allows the pack to grow larger than the specified threshold, but only by one additional object. This allows cruft packs to become just a little bit larger than the threshold, allowing cruft packs to accumulate past the set threshold and avoid being repacked in the future until a pruning GC takes place. Noticed-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com> --- Documentation/config/pack.adoc | 4 ++ Documentation/git-pack-objects.adoc | 4 ++ builtin/pack-objects.c | 32 +++++++++++++- t/t5329-pack-objects-cruft.sh | 67 +++++++++++++++++++++++++++++ 4 files changed, 105 insertions(+), 2 deletions(-)