diff mbox series

[v2] builtin/repack: fix `--keep-unreachable` when there are no packs

Message ID 20250204-b4-pks-repack-unreachable-objects-wo-packfiles-v2-1-1eae23366711@pks.im (mailing list archive)
State Accepted
Commit 414c82300abf8d1f4c8ce7bacc68f3848bdb27f4
Headers show
Series [v2] builtin/repack: fix `--keep-unreachable` when there are no packs | expand

Commit Message

Patrick Steinhardt Feb. 4, 2025, 7 a.m. UTC
The "--keep-unreachable" flag is supposed to append any unreachable
objects to the newly written pack. This flag is explicitly documented as
appending both packed and loose unreachable objects to the new packfile.
And while this works alright when repacking with preexisting packfiles,
it stops working when the repository does not have any packfiles at all.

The root cause are the conditions used to decide whether or not we want
to append "--pack-loose-unreachable" to git-pack-objects(1). There are
a couple of conditions here:

  - `has_existing_non_kept_packs()` checks whether there are existing
    packfiles. This condition makes sense to guard "--keep-pack=",
    "--unpack-unreachable" and "--keep-unreachable", because all of
    these flags only make sense in combination with existing packfiles.
    But it does not make sense to disable `--pack-loose-unreachable`
    when there aren't any preexisting packfiles, as loose objects can be
    packed into the new packfile regardless of that.

  - `delete_redundant` checks whether we want to delete any objects or
    packs that are about to become redundant. The documentation of
    `--keep-unreachable` explicitly says that `git repack -ad` needs to
    be executed for the flag to have an effect.

    It is not immediately obvious why such redundant objects need to be
    deleted in order for "--pack-unreachable-objects" to be effective.
    But as things are working as documented this is nothing we'll change
    for now.

  - `pack_everything & PACK_CRUFT` checks that we're not creating a
    cruft pack. This condition makes sense in the context of
    "--pack-loose-unreachable", as unreachable objects would end up in
    the cruft pack anyway.

So while the second and third condition are sensible, it does not make
any sense to condition `--pack-loose-unreachable` on the existence of
packfiles.

Fix the bug by splitting out the "--pack-loose-unreachable" and only
making it depend on the second and third condition. Like this, loose
unreachable objects will be packed regardless of any preexisting
packfiles.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
Hi,

this small patch series fixes `git repack -ad --keep-unreachable` when
there aren't any preexisting packfiles.

Changes in v2:
  - Merge tests into t7701.
  - Link to v1: https://lore.kernel.org/r/20250203-b4-pks-repack-unreachable-objects-wo-packfiles-v1-0-7c4d69c5072c@pks.im

Thanks!

Patrick
---
 builtin/repack.c                     |  5 ++++-
 t/t7701-repack-unpack-unreachable.sh | 16 ++++++++++++++++
 2 files changed, 20 insertions(+), 1 deletion(-)


---
base-commit: 3b0d05c4a79d0e441283680a864529b02dca5f08
change-id: 20250203-b4-pks-repack-unreachable-objects-wo-packfiles-33c26066e5c3

Comments

Jeff King Feb. 4, 2025, 3:22 p.m. UTC | #1
On Tue, Feb 04, 2025 at 08:00:41AM +0100, Patrick Steinhardt wrote:

> this small patch series fixes `git repack -ad --keep-unreachable` when
> there aren't any preexisting packfiles.
> 
> Changes in v2:
>   - Merge tests into t7701.
>   - Link to v1: https://lore.kernel.org/r/20250203-b4-pks-repack-unreachable-objects-wo-packfiles-v1-0-7c4d69c5072c@pks.im

This looks good to me.

One interesting thing I did notice:

> +test_expect_success 'repack -k packs unreachable loose objects without existing packfiles' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +
> +		oid=$(echo would-be-deleted-loose | git hash-object -w --stdin) &&
> +		objpath=.git/objects/$(echo $sha1 | sed "s,..,&/,") &&
> +		test_path_is_file $objpath &&
> +
> +		git repack -ad --keep-unreachable &&
> +		test_path_is_missing $objpath &&
> +		git cat-file -p $oid
> +	)
> +'

In the test in v1, we had reachable commits to pack. And here we don't.
So before your patch, the behavior in the v1 test was that we'd create a
new pack, but it wouldn't pick up the loose object. But the behavior of
this test is that we say "Nothing new to pack".

I originally thought that output meant that we were not running
pack-objects at all. But looking at builtin/repack.c, we do run it, and
it simply chooses not to make a pack (which makes sense; how would
repack even realize if there was stuff to pack, since pack-objects is
what does the traversal).

So the two outcomes are both the result of the same bug. In both cases
we do not correctly pack the loose objects, so whether we make a pack is
just a question of whether there was other reachable stuff to pack. And
since your patch is fixing the bug at its root, both outcomes are fixed.

And when I suggested in my response to v1 that "Nothing new to pack" in
an empty repo was a separate bug, I was just wrong. ;) There is nothing
else to fix after your patch.

Thanks for finding and fixing.

-Peff
Patrick Steinhardt Feb. 5, 2025, 5:36 a.m. UTC | #2
On Tue, Feb 04, 2025 at 10:22:36AM -0500, Jeff King wrote:
> On Tue, Feb 04, 2025 at 08:00:41AM +0100, Patrick Steinhardt wrote:
> 
> > this small patch series fixes `git repack -ad --keep-unreachable` when
> > there aren't any preexisting packfiles.
> > 
> > Changes in v2:
> >   - Merge tests into t7701.
> >   - Link to v1: https://lore.kernel.org/r/20250203-b4-pks-repack-unreachable-objects-wo-packfiles-v1-0-7c4d69c5072c@pks.im
> 
> This looks good to me.
> 
> One interesting thing I did notice:
> 
> > +test_expect_success 'repack -k packs unreachable loose objects without existing packfiles' '
> > +	test_when_finished "rm -rf repo" &&
> > +	git init repo &&
> > +	(
> > +		cd repo &&
> > +
> > +		oid=$(echo would-be-deleted-loose | git hash-object -w --stdin) &&
> > +		objpath=.git/objects/$(echo $sha1 | sed "s,..,&/,") &&
> > +		test_path_is_file $objpath &&
> > +
> > +		git repack -ad --keep-unreachable &&
> > +		test_path_is_missing $objpath &&
> > +		git cat-file -p $oid
> > +	)
> > +'
> 
> In the test in v1, we had reachable commits to pack. And here we don't.
> So before your patch, the behavior in the v1 test was that we'd create a
> new pack, but it wouldn't pick up the loose object. But the behavior of
> this test is that we say "Nothing new to pack".
> 
> I originally thought that output meant that we were not running
> pack-objects at all. But looking at builtin/repack.c, we do run it, and
> it simply chooses not to make a pack (which makes sense; how would
> repack even realize if there was stuff to pack, since pack-objects is
> what does the traversal).
> 
> So the two outcomes are both the result of the same bug. In both cases
> we do not correctly pack the loose objects, so whether we make a pack is
> just a question of whether there was other reachable stuff to pack. And
> since your patch is fixing the bug at its root, both outcomes are fixed.
> 
> And when I suggested in my response to v1 that "Nothing new to pack" in
> an empty repo was a separate bug, I was just wrong. ;) There is nothing
> else to fix after your patch.
> 
> Thanks for finding and fixing.

Thanks for your thorough review!

Patrick
diff mbox series

Patch

diff --git a/builtin/repack.c b/builtin/repack.c
index 81d13630ea..8194344b04 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -1370,9 +1370,12 @@  int cmd_repack(int argc,
 					    "--unpack-unreachable");
 			} else if (keep_unreachable) {
 				strvec_push(&cmd.args, "--keep-unreachable");
-				strvec_push(&cmd.args, "--pack-loose-unreachable");
 			}
 		}
+
+		if (keep_unreachable && delete_redundant &&
+		    !(pack_everything & PACK_CRUFT))
+			strvec_push(&cmd.args, "--pack-loose-unreachable");
 	} else if (geometry.split_factor) {
 		strvec_push(&cmd.args, "--stdin-packs");
 		strvec_push(&cmd.args, "--unpacked");
diff --git a/t/t7701-repack-unpack-unreachable.sh b/t/t7701-repack-unpack-unreachable.sh
index 5715f4d69a..5559d4ccb4 100755
--- a/t/t7701-repack-unpack-unreachable.sh
+++ b/t/t7701-repack-unpack-unreachable.sh
@@ -195,4 +195,20 @@  test_expect_success 'repack -k packs unreachable loose objects' '
 	git cat-file -p $sha1
 '
 
+test_expect_success 'repack -k packs unreachable loose objects without existing packfiles' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		oid=$(echo would-be-deleted-loose | git hash-object -w --stdin) &&
+		objpath=.git/objects/$(echo $sha1 | sed "s,..,&/,") &&
+		test_path_is_file $objpath &&
+
+		git repack -ad --keep-unreachable &&
+		test_path_is_missing $objpath &&
+		git cat-file -p $oid
+	)
+'
+
 test_done