diff mbox series

[v3,2/3] index-pack --promisor: don't check blobs

Message ID a1d2a202031e4b932d053b5c216ecd57ec9c728a.1733262662.git.jonathantanmy@google.com (mailing list archive)
State Accepted
Commit 36198026d85848119a8eeb9286d10e795a9e0461
Headers show
Series Performance improvements for repacking non-promisor objects | expand

Commit Message

Jonathan Tan Dec. 3, 2024, 9:52 p.m. UTC
As a follow-up to the parent of this commit, it was found that not
checking for the existence of blobs linked from trees sped up the fetch
from 24m47.815s to 2m2.127s. Teach Git to do that.

The tradeoff of not checking blobs is documented in a code comment.

(Blobs may also be linked from tag objects, but it is impossible to know
the type of an object linked from a tag object without looking it up in
the object database, so the code for that is untouched.)

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/index-pack.c | 31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)
diff mbox series

Patch

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index d1c777a6af..2e90fe186e 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -817,6 +817,35 @@  static void record_outgoing_link(const struct object_id *oid)
 	oidset_insert(&outgoing_links, oid);
 }
 
+static void maybe_record_name_entry(const struct name_entry *entry)
+{
+	/*
+	 * Checking only trees here results in a significantly faster packfile
+	 * indexing, but the drawback is that if the packfile to be indexed
+	 * references a local blob only directly (that is, never through a
+	 * local tree), that local blob is in danger of being garbage
+	 * collected. Such a situation may arise if we push local commits,
+	 * including one with a change to a blob in the root tree, and then the
+	 * server incorporates them into its main branch through a "rebase" or
+	 * "squash" merge strategy, and then we fetch the new main branch from
+	 * the server.
+	 *
+	 * This situation has not been observed yet - we have only noticed
+	 * missing commits, not missing trees or blobs. (In fact, if it were
+	 * believed that only missing commits are problematic, one could argue
+	 * that we should also exclude trees during the outgoing link check;
+	 * but it is safer to include them.)
+	 *
+	 * Due to the rarity of the situation (it has not been observed to
+	 * happen in real life), and because the "penalty" in such a situation
+	 * is merely to refetch the missing blob when it's needed (and this
+	 * happens only once - when refetched, the blob goes into a promisor
+	 * pack, so it won't be GC-ed, the tradeoff seems worth it.
+	*/
+	if (S_ISDIR(entry->mode))
+		record_outgoing_link(&entry->oid);
+}
+
 static void do_record_outgoing_links(struct object *obj)
 {
 	if (obj->type == OBJ_TREE) {
@@ -831,7 +860,7 @@  static void do_record_outgoing_links(struct object *obj)
 			 */
 			return;
 		while (tree_entry_gently(&desc, &entry))
-			record_outgoing_link(&entry.oid);
+			maybe_record_name_entry(&entry);
 	} else if (obj->type == OBJ_COMMIT) {
 		struct commit *commit = (struct commit *) obj;
 		struct commit_list *parents = commit->parents;