diff mbox series

[v2,7/8] repack: honor `-l` when calculating pack geometry

Message ID 608dde4ad52c28ef42845b6bfdcb168e252bd29b.1681294715.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series repack: fix geometric repacking with gitalternates | expand

Commit Message

Patrick Steinhardt April 12, 2023, 10:22 a.m. UTC
When the user passes `-l` to git-repack(1), then they essentially ask us
to only repack objects part of the local object database while ignoring
any packfiles part of an alternate object database. And we in fact honor
this bit when doing a geometric repack as the resulting packfile will
only ever contain local objects.

What we're missing though is that we don't take locality of packfiles
into account when computing whether the geometric sequence is intact or
not. So even though we would only ever roll up local packfiles anyway,
we could end up trying to repack because of non-local packfiles. This
does not make much sense, and in the worst case it can cause us to try
and do the geometric repack over and over again because we're never able
to restore the geometric sequence.

Fix this bug by honoring whether the user has passed `-l`. If so, we
skip adding any non-local packfiles to the pack geometry.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/repack.c            | 13 ++++++--
 t/t7703-repack-geometric.sh | 59 +++++++++++++++++++++++++++++++++++++
 2 files changed, 70 insertions(+), 2 deletions(-)

Comments

Junio C Hamano April 12, 2023, 11:56 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> +	# Note that we are using stat(1) to verify idempotence to also verify
> +	# that the mtime did not change. This is done in order to detect the
> +	# case where we do repack objects, but the resulting packfile is the
> +	# same.
> +	stat member/.git/objects/pack/* >expected-member-packs &&
> +	git -C member repack --geometric=2 -l -d &&
> +	stat member/.git/objects/pack/* >actual-member-packs &&

These tests that use stat(1) are very likely to be non-portable.

Worse yet, even when stat(1) is available, when running

    $ cd t && sh t7703-repack-geometric.sh --root=/dev/shm/testpen -i -v

in my environment (based on Debian), the lists fail to match at
subseconds.

--- expected-member-packs       2023-04-12 23:48:52.817035290 +0000
+++ actual-member-packs 2023-04-12 23:48:52.833036370 +0000
@@ -2,7 +2,7 @@
   Size: 1156           Blocks: 8          IO Block: 4096   regular file
 Device: 0,25   Inode: 2330515     Links: 1
 Access: (0400/-r--------)  Uid: (1001/   gitster)   Gid: (1001/gitster)
-Access: 2023-04-12 23:48:52.801034210 +0000
+Access: 2023-04-12 23:48:52.829036100 +0000
 Modify: 2023-04-12 23:48:52.773032320 +0000
 Change: 2023-04-12 23:48:52.773032320 +0000
  Birth: 2023-04-12 23:48:52.773032320 +0000
@@ -10,7 +10,7 @@
   Size: 276            Blocks: 8          IO Block: 4096   regular file
 Device: 0,25   Inode: 2330514     Links: 1
 Access: (0400/-r--------)  Uid: (1001/   gitster)   Gid: (1001/gitster)
-Access: 2023-04-12 23:48:52.781032860 +0000
+Access: 2023-04-12 23:48:52.829036100 +0000
 Modify: 2023-04-12 23:48:52.773032320 +0000
 Change: 2023-04-12 23:48:52.773032320 +0000
  Birth: 2023-04-12 23:48:52.769032050 +0000
Junio C Hamano April 13, 2023, 5:11 a.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> Worse yet, even when stat(1) is available, when running
>
>     $ cd t && sh t7703-repack-geometric.sh --root=/dev/shm/testpen -i -v
>
> in my environment (based on Debian), the lists fail to match at
> subseconds.
> ...

A CI run of 'seen' with this topic fails on macOS:

  https://github.com/git/git/actions/runs/4683726291/jobs/8299125722#step:4:2090

Almost the same 'seen' without this topic is CI clean:

  https://github.com/git/git/actions/runs/4683470969
Patrick Steinhardt April 13, 2023, 6:41 a.m. UTC | #3
On Wed, Apr 12, 2023 at 10:11:06PM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Worse yet, even when stat(1) is available, when running
> >
> >     $ cd t && sh t7703-repack-geometric.sh --root=/dev/shm/testpen -i -v
> >
> > in my environment (based on Debian), the lists fail to match at
> > subseconds.
> > ...
> 
> A CI run of 'seen' with this topic fails on macOS:
> 
>   https://github.com/git/git/actions/runs/4683726291/jobs/8299125722#step:4:2090
> 
> Almost the same 'seen' without this topic is CI clean:
> 
>   https://github.com/git/git/actions/runs/4683470969
> 

Thanks, I'll come up with an alternative in v3 of this patch series.

Patrick
diff mbox series

Patch

diff --git a/builtin/repack.c b/builtin/repack.c
index 80d4e813c8..f57869f14a 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -325,7 +325,8 @@  static int geometry_cmp(const void *va, const void *vb)
 }
 
 static void init_pack_geometry(struct pack_geometry **geometry_p,
-			       struct string_list *existing_kept_packs)
+			       struct string_list *existing_kept_packs,
+			       const struct pack_objects_args *args)
 {
 	struct packed_git *p;
 	struct pack_geometry *geometry;
@@ -335,6 +336,14 @@  static void init_pack_geometry(struct pack_geometry **geometry_p,
 	geometry = *geometry_p;
 
 	for (p = get_all_packs(the_repository); p; p = p->next) {
+		if (args->local && !p->pack_local)
+			/*
+			 * When asked to only repack local packfiles we skip
+			 * over any packfiles that are borrowed from alternate
+			 * object directories.
+			 */
+			continue;
+
 		if (!pack_kept_objects) {
 			/*
 			 * Any pack that has its pack_keep bit set will appear
@@ -897,7 +906,7 @@  int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (geometric_factor) {
 		if (pack_everything)
 			die(_("options '%s' and '%s' cannot be used together"), "--geometric", "-A/-a");
-		init_pack_geometry(&geometry, &existing_kept_packs);
+		init_pack_geometry(&geometry, &existing_kept_packs, &po_args);
 		split_pack_geometry(geometry, geometric_factor);
 	}
 
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 0a2f2bd46c..96c8d4cdfa 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -10,6 +10,12 @@  objdir=.git/objects
 packdir=$objdir/pack
 midx=$objdir/pack/multi-pack-index
 
+packed_objects() {
+	git show-index <"$1" >tmp-object-list &&
+	cut -d' ' -f2 tmp-object-list &&
+	rm tmp-object-list
+ }
+
 test_expect_success '--geometric with no packs' '
 	git init geometric &&
 	test_when_finished "rm -fr geometric" &&
@@ -361,4 +367,57 @@  test_expect_success '--geometric with same pack in main and alternate ODB' '
 	test_cmp expected-files actual-files
 '
 
+test_expect_success '--geometric -l with non-intact geometric sequence across ODBs' '
+	test_when_finished "rm -fr shared member" &&
+
+	git init shared &&
+	test_commit_bulk -C shared --start=1 1 &&
+
+	git clone --shared shared member &&
+	test_commit_bulk -C member --start=2 1 &&
+
+	# Verify that our assumptions actually hold: both generated packfiles
+	# should have three objects and should be non-equal.
+	packed_objects shared/.git/objects/pack/pack-*.idx >packed-objects &&
+	test_line_count = 3 packed-objects &&
+	packed_objects member/.git/objects/pack/pack-*.idx >packed-objetcs &&
+	test_line_count = 3 packed-objects &&
+	test "$(basename member/.git/objects/pack/pack-*.pack)" != "$(basename shared/.git/objects/pack/pack-*.pack)" &&
+
+	# Perform the geometric repack. With `-l`, we should only see the local
+	# packfile and thus arrive at the conclusion that the geometric
+	# sequence is intact. We thus expect no changes.
+	#
+	# Note that we are using stat(1) to verify idempotence to also verify
+	# that the mtime did not change. This is done in order to detect the
+	# case where we do repack objects, but the resulting packfile is the
+	# same.
+	stat member/.git/objects/pack/* >expected-member-packs &&
+	git -C member repack --geometric=2 -l -d &&
+	stat member/.git/objects/pack/* >actual-member-packs &&
+	test_cmp expected-member-packs actual-member-packs &&
+
+	(
+		packed_objects shared/.git/objects/pack/pack-*.idx &&
+		packed_objects member/.git/objects/pack/pack-*.idx
+	) | sort >expected-objects &&
+
+	# On the other hand, when doing a non-local geometric repack we should
+	# see both packfiles and thus repack them. We expect that the shared
+	# object database was not changed.
+	stat shared/.git/objects/pack/* >expected-shared-packs &&
+	git -C member repack --geometric=2 -d &&
+	stat shared/.git/objects/pack/* >actual-shared-packs &&
+	test_cmp expected-shared-packs actual-shared-packs &&
+
+	# Furthermore, we expect that the member repository now has a single
+	# packfile that contains the combined shared and non-shared objects.
+	ls member/.git/objects/pack/pack-*.idx >actual &&
+	test_line_count = 1 actual &&
+	packed_objects member/.git/objects/pack/pack-*.idx >actual-objects &&
+	test_line_count = 6 actual-objects &&
+	sort <actual-objects >actual-objects.sorted &&
+	test_cmp expected-objects actual-objects.sorted
+'
+
 test_done