diff mbox series

[2/8] builtin/repack.c: extract marking packs for deletion

Message ID 313537ef68892b15e772eaad8937a4a8c7ebbe61.1693946195.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series repack: refactor pack snapshot-ing logic | expand

Commit Message

Taylor Blau Sept. 5, 2023, 8:36 p.m. UTC
At the end of a repack (when given `-d`), Git attempts to remove any
packs which have been made "redundant" as a result of the repacking
operation. For example, an all-into-one (`-A` or `-a`) repack makes
every pre-existing pack which is not marked as kept redundant. Geometric
repacks (with `--geometric=<n>`) make any packs which were rolled up
redundant, and so on.

But before deleting the set of packs we think are redundant, we first
check to see whether or not we just wrote a pack which is identical to
any one of the packs we were going to delete. When this is the case, Git
must avoid deleting that pack, since it matches a pack we just wrote
(so deleting it may cause the repository to become corrupt).

Right now we only process the list of non-kept packs in a single pass.
But a future change will split the existing non-kept packs further into
two lists: one for cruft packs, and another for non-cruft packs.

Factor out this routine to prepare for calling it twice on two separate
lists in a future patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c | 50 +++++++++++++++++++++++++++++++-----------------
 1 file changed, 32 insertions(+), 18 deletions(-)

Comments

Jeff King Sept. 7, 2023, 7:59 a.m. UTC | #1
On Tue, Sep 05, 2023 at 04:36:43PM -0400, Taylor Blau wrote:

> At the end of a repack (when given `-d`), Git attempts to remove any
> packs which have been made "redundant" as a result of the repacking
> operation. For example, an all-into-one (`-A` or `-a`) repack makes
> every pre-existing pack which is not marked as kept redundant. Geometric
> repacks (with `--geometric=<n>`) make any packs which were rolled up
> redundant, and so on.
> 
> But before deleting the set of packs we think are redundant, we first
> check to see whether or not we just wrote a pack which is identical to
> any one of the packs we were going to delete. When this is the case, Git
> must avoid deleting that pack, since it matches a pack we just wrote
> (so deleting it may cause the repository to become corrupt).
> 
> Right now we only process the list of non-kept packs in a single pass.
> But a future change will split the existing non-kept packs further into
> two lists: one for cruft packs, and another for non-cruft packs.
> 
> Factor out this routine to prepare for calling it twice on two separate
> lists in a future patch.

There are really two "factor outs" here: we pull the code from
cmd_repack() into a helper, and then the helper is also just a thin
wrapper around its "_1" variant. That latter part isn't needed yet, but
I can guess from your description that we'll eventually have the main
function dispatch to the "_1" helper for lists.

The main caller in cmd_repack() could do that double-dispatch, but then
this code:

> +	if (delete_redundant && pack_everything & ALL_INTO_ONE)
> +		mark_packs_for_deletion(&existing, &names);

would know more about how "existing_packs" work than it needs to. So
this seems like a good split (and the two-liner above is making
cmd_repack() much more readable than the big loop it had in the
pre-image).

-Peff
Taylor Blau Sept. 7, 2023, 10:10 p.m. UTC | #2
On Thu, Sep 07, 2023 at 03:59:31AM -0400, Jeff King wrote:
> There are really two "factor outs" here: we pull the code from
> cmd_repack() into a helper, and then the helper is also just a thin
> wrapper around its "_1" variant. That latter part isn't needed yet, but
> I can guess from your description that we'll eventually have the main
> function dispatch to the "_1" helper for lists.

Yeah... the "_1" variant looks ugly in isolation in this patch, but I
think makes things cleaner in subsequent patches.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/builtin/repack.c b/builtin/repack.c
index c3ab89912e..708556836e 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -105,6 +105,36 @@  struct existing_packs {
 	.non_kept_packs = STRING_LIST_INIT_DUP, \
 }
 
+static void mark_packs_for_deletion_1(struct string_list *names,
+				      struct string_list *list)
+{
+	struct string_list_item *item;
+	const int hexsz = the_hash_algo->hexsz;
+
+	for_each_string_list_item(item, list) {
+		char *sha1;
+		size_t len = strlen(item->string);
+		if (len < hexsz)
+			continue;
+		sha1 = item->string + len - hexsz;
+		/*
+		 * Mark this pack for deletion, which ensures that this
+		 * pack won't be included in a MIDX (if `--write-midx`
+		 * was given) and that we will actually delete this pack
+		 * (if `-d` was given).
+		 */
+		if (!string_list_has_string(names, sha1))
+			item->util = (void*)(uintptr_t)((size_t)item->util | DELETE_PACK);
+	}
+}
+
+static void mark_packs_for_deletion(struct existing_packs *existing,
+				    struct string_list *names)
+
+{
+	mark_packs_for_deletion_1(names, &existing->non_kept_packs);
+}
+
 static void existing_packs_release(struct existing_packs *existing)
 {
 	string_list_clear(&existing->kept_packs, 0);
@@ -1141,24 +1171,8 @@  int cmd_repack(int argc, const char **argv, const char *prefix)
 	}
 	/* End of pack replacement. */
 
-	if (delete_redundant && pack_everything & ALL_INTO_ONE) {
-		const int hexsz = the_hash_algo->hexsz;
-		for_each_string_list_item(item, &existing.non_kept_packs) {
-			char *sha1;
-			size_t len = strlen(item->string);
-			if (len < hexsz)
-				continue;
-			sha1 = item->string + len - hexsz;
-			/*
-			 * Mark this pack for deletion, which ensures that this
-			 * pack won't be included in a MIDX (if `--write-midx`
-			 * was given) and that we will actually delete this pack
-			 * (if `-d` was given).
-			 */
-			if (!string_list_has_string(&names, sha1))
-				item->util = (void*)(uintptr_t)((size_t)item->util | DELETE_PACK);
-		}
-	}
+	if (delete_redundant && pack_everything & ALL_INTO_ONE)
+		mark_packs_for_deletion(&existing, &names);
 
 	if (write_midx) {
 		struct string_list include = STRING_LIST_INIT_NODUP;