diff mbox series

[3/7] midx.c: prevent `expire` from removing the cruft pack

Message ID 3ae9903d2df491e291ab975c56ec78aa13d95655.1663638929.git.me@ttaylorr.com (mailing list archive)
State Accepted
Commit 757d457907e3efa8eb911b772a690661cd432da5
Headers show
Series midx: ignore cruft pack with `repack`, `expire` | expand

Commit Message

Taylor Blau Sept. 20, 2022, 1:55 a.m. UTC
The `expire` sub-command unlinks any packs that are (a) contained in the
MIDX, but (b) have no objects referenced by the MIDX.

This sub-command ignores `.keep` packs, which remain on-disk even if
they have no objects referenced by the MIDX. Cruft packs, however,
aren't given the same treatment: if none of the objects contained in the
cruft pack are selected from the cruft pack by the MIDX, then the cruft
pack is eligible to be expired.

This is less than desireable, since the cruft pack has important
metadata about the individual object mtimes, which is useful to
determine how quickly an object should age out of the repository when
pruning.

Ordinarily, we wouldn't expect the contents of a cruft pack to
duplicated across non-cruft packs (and we'd expect to see the MIDX
select all cruft objects from other sources even less often). But
nonetheless, it is still possible to trick the `expire` sub-command into
removing the `.mtimes` file in this circumstance.

Teach the `expire` sub-command to ignore cruft packs in the same manner
as it does `.keep` packs, in order to keep their metadata around, even
when they are unreferenced by the MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt |  4 ++--
 midx.c                                 |  2 +-
 t/t5319-multi-pack-index.sh            | 30 ++++++++++++++++++++++++++
 3 files changed, 33 insertions(+), 3 deletions(-)

Comments

Derrick Stolee Sept. 22, 2022, 2:08 p.m. UTC | #1
On 9/19/2022 9:55 PM, Taylor Blau wrote:
> The `expire` sub-command unlinks any packs that are (a) contained in the
> MIDX, but (b) have no objects referenced by the MIDX.

It is important to note that this can only happen if all objects in
the pack have duplicates in other pack-files.
 
> This sub-command ignores `.keep` packs, which remain on-disk even if
> they have no objects referenced by the MIDX. Cruft packs, however,
> aren't given the same treatment: if none of the objects contained in the
> cruft pack are selected from the cruft pack by the MIDX, then the cruft
> pack is eligible to be expired.
> 
> This is less than desireable, since the cruft pack has important

s/desireable/desirable/
(according to my spell-checker)

> metadata about the individual object mtimes, which is useful to
> determine how quickly an object should age out of the repository when
> pruning.
>
> Ordinarily, we wouldn't expect the contents of a cruft pack to
> duplicated across non-cruft packs (and we'd expect to see the MIDX
> select all cruft objects from other sources even less often). But
> nonetheless, it is still possible to trick the `expire` sub-command into
> removing the `.mtimes` file in this circumstance.

I was initially unconvinced that this scenario was super-critical
to keeping the .mtimes file, but I was able to think of cases where
objects are duplicated out of the cruft pack due to de-thinning or
otherwise inefficiently packing unreachable objects into these
other pack-files.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index 11e6dc53e3..3696506eb3 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -72,8 +72,8 @@  verify::
 expire::
 	Delete the pack-files that are tracked by the MIDX file, but
 	have no objects referenced by the MIDX (with the exception of
-	`.keep` packs). Rewrite the MIDX file afterward to remove all
-	references to these pack-files.
+	`.keep` packs and cruft packs). Rewrite the MIDX file afterward
+	to remove all references to these pack-files.
 
 repack::
 	Create a new pack-file containing objects in small pack-files
diff --git a/midx.c b/midx.c
index c27d0e5f15..bff5b99933 100644
--- a/midx.c
+++ b/midx.c
@@ -1839,7 +1839,7 @@  int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla
 		if (prepare_midx_pack(r, m, i))
 			continue;
 
-		if (m->packs[i]->pack_keep)
+		if (m->packs[i]->pack_keep || m->packs[i]->is_cruft)
 			continue;
 
 		pack_name = xstrdup(m->packs[i]->pack_name);
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index afbe93f162..2d51b09680 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -847,6 +847,36 @@  test_expect_success 'expire respects .keep files' '
 	)
 '
 
+test_expect_success 'expiring unreferenced cruft pack retains pack' '
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit base &&
+		test_commit --no-tag unreachable &&
+		unreachable=$(git rev-parse HEAD) &&
+
+		git reset --hard base &&
+		git reflog expire --all --expire=all &&
+		git repack --cruft -d &&
+		mtimes="$(ls $objdir/pack/pack-*.mtimes)" &&
+
+		echo "base..$unreachable" >in &&
+		pack="$(git pack-objects --revs --delta-base-offset \
+			$objdir/pack/pack <in)" &&
+
+		# Preferring the contents of "$pack" will leave the
+		# cruft pack unreferenced (ie., none of the objects
+		# contained in the cruft pack will have their MIDX copy
+		# selected from the cruft pack).
+		git multi-pack-index write --preferred-pack="pack-$pack.pack" &&
+		git multi-pack-index expire &&
+
+		test_path_is_file "$mtimes"
+	)
+'
+
 test_expect_success 'repack --batch-size=0 repacks everything' '
 	cp -r dup dup2 &&
 	(