diff mbox series

[8/8] pseudo-merge.c: ensure pseudo-merge groups are closed

Message ID c9a64b1d2a9d6b3fe1f5fb0a7303e043114fcd8f.1723743050.git.me@ttaylorr.com (mailing list archive)
State Accepted
Commit a72dfab8b8bcccee06d7bf53e5c0323e82a1765a
Headers show
Series pseudo-merge: avoid empty and non-closed pseudo-merge commits | expand

Commit Message

Taylor Blau Aug. 15, 2024, 5:31 p.m. UTC
When generating pseudo-merge bitmaps, it's possible that concurrent
reference updates may reveal some pseudo-merge candidates which reach
objects that are not contained in the bitmap's pack or pseudo-pack
order (in the case of MIDX bitmaps).

The latter case is relatively easy to demonstrate: if we generate a MIDX
bitmap with only half of the repository packed, then the unpacked
contents are not part of the MIDX's object order.

If we happen to select one or more commit(s) from the unpacked portion
of the repository for inclusion in a pseudo-merge, we'll get the
following message when trying to generate its bitmap:

    $ git multi-pack-index write --bitmap
    [...]
    Selecting pseudo-merge commits: 100% (1/1), done.
    warning: Failed to write bitmap index. Packfile doesn't have full closure (object ... is missing)
    Building bitmaps:  50% (1/2), done.
    error: could not write multi-pack bitmap

, and the attempted bitmap write will fail, leaving the repository
without a current bitmap.

Rectify this by ensuring that the commits which are pseudo-merge
candidates can only be so if they appear somewhere in the packing order.

This is sufficient, since we know that the original packing order is
closed under reachability, so if a commit appears in that list as a
potential pseudo-merge candidate, we know that everything reachable from
it also appears in the list (and thus the candidate is a good one).

Noticed-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pseudo-merge.c                  |  2 ++
 t/t5333-pseudo-merge-bitmaps.sh | 36 +++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)

Comments

Jeff King Aug. 17, 2024, 10:43 a.m. UTC | #1
On Thu, Aug 15, 2024 at 01:31:20PM -0400, Taylor Blau wrote:

> Rectify this by ensuring that the commits which are pseudo-merge
> candidates can only be so if they appear somewhere in the packing order.
> 
> This is sufficient, since we know that the original packing order is
> closed under reachability, so if a commit appears in that list as a
> potential pseudo-merge candidate, we know that everything reachable from
> it also appears in the list (and thus the candidate is a good one).

Right, good explanation.

> diff --git a/pseudo-merge.c b/pseudo-merge.c
> index 6422be979c..7ec9d4c51c 100644
> --- a/pseudo-merge.c
> +++ b/pseudo-merge.c
> @@ -217,6 +217,8 @@ static int find_pseudo_merge_group_for_ref(const char *refname,
>  	c = lookup_commit(the_repository, oid);
>  	if (!c)
>  		return 0;
> +	if (!packlist_find(writer->to_pack, oid))
> +		return 0;
>  
>  	has_bitmap = bitmap_writer_has_bitmapped_object_id(writer, oid);
>  

And the patch looks good. I wondered about checking the packlist before
calling lookup_commit(), but the latter is really not very expensive (it
is not reading the object, but just creating a struct).

> +test_expect_success 'pseudo-merge closure' '
> +	git init pseudo-merge-closure &&
> +	(
> +		cd pseudo-merge-closure &&
> +
> +		test_commit A &&
> +		git repack -d &&
> +
> +		test_commit B &&
> +
> +		# Note that the contents of A is packed, but B is not. A
> +		# (and the objects reachable from it) are thus visible
> +		# to the MIDX, but the same is not true for B and its
> +		# objects.
> +		#
> +		# Ensure that we do not attempt to create a pseudo-merge
> +		# for B, depsite it matching the below pseudo-merge
> +		# group pattern, as doing so would result in a failure
> +		# to write a non-closed bitmap.
> +		git config bitmapPseudoMerge.test.pattern refs/ &&
> +		git config bitmapPseudoMerge.test.threshold now &&
> +
> +		git multi-pack-index write --bitmap &&

OK, clever. In the real world, I think this would happen racily, because
you'd usually suck up all of the loose objects into a pack to feed into
the midx. And the problem is new objects (whether packed or not) that
are referenced after that step.

But here we just skip that step and generate the midx directly, which
lets us do it deterministically.

-Peff
diff mbox series

Patch

diff --git a/pseudo-merge.c b/pseudo-merge.c
index 6422be979c..7ec9d4c51c 100644
--- a/pseudo-merge.c
+++ b/pseudo-merge.c
@@ -217,6 +217,8 @@  static int find_pseudo_merge_group_for_ref(const char *refname,
 	c = lookup_commit(the_repository, oid);
 	if (!c)
 		return 0;
+	if (!packlist_find(writer->to_pack, oid))
+		return 0;
 
 	has_bitmap = bitmap_writer_has_bitmapped_object_id(writer, oid);
 
diff --git a/t/t5333-pseudo-merge-bitmaps.sh b/t/t5333-pseudo-merge-bitmaps.sh
index aa1a7d26f1..1dd6284756 100755
--- a/t/t5333-pseudo-merge-bitmaps.sh
+++ b/t/t5333-pseudo-merge-bitmaps.sh
@@ -410,4 +410,40 @@  test_expect_success 'empty pseudo-merge group' '
 	)
 '
 
+test_expect_success 'pseudo-merge closure' '
+	git init pseudo-merge-closure &&
+	(
+		cd pseudo-merge-closure &&
+
+		test_commit A &&
+		git repack -d &&
+
+		test_commit B &&
+
+		# Note that the contents of A is packed, but B is not. A
+		# (and the objects reachable from it) are thus visible
+		# to the MIDX, but the same is not true for B and its
+		# objects.
+		#
+		# Ensure that we do not attempt to create a pseudo-merge
+		# for B, depsite it matching the below pseudo-merge
+		# group pattern, as doing so would result in a failure
+		# to write a non-closed bitmap.
+		git config bitmapPseudoMerge.test.pattern refs/ &&
+		git config bitmapPseudoMerge.test.threshold now &&
+
+		git multi-pack-index write --bitmap &&
+
+		test-tool bitmap dump-pseudo-merges >pseudo-merges &&
+		test_line_count = 1 pseudo-merges &&
+
+		git rev-parse A >expect &&
+
+		test-tool bitmap list-commits >actual &&
+		test_cmp expect actual &&
+		test-tool bitmap dump-pseudo-merge-commits 0 >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done