diff mbox series

[v2,7/9] commit-graph: drop count_distinct_commits() function

Message ID X85+RnO5rJQHnmmB@coredump.intra.peff.net (mailing list archive)
State Accepted
Commit 1cbdbf3bef7e9167794cb2c12f9af1a450584ebe
Headers show
Series misc commit-graph and oid-array cleanups | expand

Commit Message

Jeff King Dec. 7, 2020, 7:11 p.m. UTC
When writing a commit graph, we collect a list of object ids in an
array, which we'll eventually copy into an array of "struct commit"
pointers. Before we do that, though, we count the number of distinct
commit entries. There's a subtle bug in this step, though.

We eliminate not only duplicate oids, but also in split mode, any oids
which are not commits or which are already in a graph file. However, the
loop starts at index 1, always counting index 0 as distinct. And indeed
it can't be a duplicate, since we check for those by comparing against
the previous entry, and there isn't one for index 0. But it could be a
commit that's already in a graph file, and we'd overcount the number of
commits by 1 in that case.

That turns out not to be a problem, though. The only things we do with
the count are:

  - check if our count will overflow our data structures. But the limit
    there is 2^31 commits, so while this is a useful check, the
    off-by-one is not likely to matter.

  - pre-allocate the array of commit pointers. But over-allocating by
    one isn't a problem; we'll just waste a few extra bytes.

The bug would be easy enough to fix, but we can observe that neither of
those steps is necessary.

After building the actual commit array, we'll likewise check its count
for overflow. So the extra check of the distinct commit count here is
redundant.

And likewise we use ALLOC_GROW() when building the commit array, so
there's no need to preallocate it (it's possible that doing so is
slightly more efficient, but if we care we can just optimistically
allocate one slot for each oid; I didn't bother here).

So count_distinct_commits() isn't doing anything useful. Let's just get
rid of that step.

Note that a side effect of the function was that we sorted the list of
oids, which we do rely on in copy_oids_to_commits(), since it must also
skip the duplicates. So we'll move the qsort there. I didn't copy the
"TODO" about adding more progress meters. It's actually quite hard to
make a repository large enough for this qsort would take an appreciable
amount of time, so this doesn't seem like a useful note.

Signed-off-by: Jeff King <peff@peff.net>
---
 commit-graph.c | 43 ++-----------------------------------------
 1 file changed, 2 insertions(+), 41 deletions(-)
diff mbox series

Patch

diff --git a/commit-graph.c b/commit-graph.c
index 6f62a07313..1ac3516cf5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1588,35 +1588,6 @@  static void fill_oids_from_all_packs(struct write_commit_graph_context *ctx)
 	stop_progress(&ctx->progress);
 }
 
-static uint32_t count_distinct_commits(struct write_commit_graph_context *ctx)
-{
-	uint32_t i, count_distinct = 1;
-
-	if (ctx->report_progress)
-		ctx->progress = start_delayed_progress(
-			_("Counting distinct commits in commit graph"),
-			ctx->oids.nr);
-	display_progress(ctx->progress, 0); /* TODO: Measure QSORT() progress */
-	QSORT(ctx->oids.list, ctx->oids.nr, oid_compare);
-
-	for (i = 1; i < ctx->oids.nr; i++) {
-		display_progress(ctx->progress, i + 1);
-		if (!oideq(&ctx->oids.list[i - 1], &ctx->oids.list[i])) {
-			if (ctx->split) {
-				struct commit *c = lookup_commit(ctx->r, &ctx->oids.list[i]);
-
-				if (!c || commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH)
-					continue;
-			}
-
-			count_distinct++;
-		}
-	}
-	stop_progress(&ctx->progress);
-
-	return count_distinct;
-}
-
 static void copy_oids_to_commits(struct write_commit_graph_context *ctx)
 {
 	uint32_t i;
@@ -1628,6 +1599,7 @@  static void copy_oids_to_commits(struct write_commit_graph_context *ctx)
 		ctx->progress = start_delayed_progress(
 			_("Finding extra edges in commit graph"),
 			ctx->oids.nr);
+	QSORT(ctx->oids.list, ctx->oids.nr, oid_compare);
 	for (i = 0; i < ctx->oids.nr; i++) {
 		unsigned int num_parents;
 
@@ -2155,7 +2127,7 @@  int write_commit_graph(struct object_directory *odb,
 		       const struct commit_graph_opts *opts)
 {
 	struct write_commit_graph_context *ctx;
-	uint32_t i, count_distinct = 0;
+	uint32_t i;
 	int res = 0;
 	int replace = 0;
 	struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS;
@@ -2268,17 +2240,6 @@  int write_commit_graph(struct object_directory *odb,
 
 	close_reachable(ctx);
 
-	count_distinct = count_distinct_commits(ctx);
-
-	if (count_distinct >= GRAPH_EDGE_LAST_MASK) {
-		error(_("the commit graph format cannot write %d commits"), count_distinct);
-		res = -1;
-		goto cleanup;
-	}
-
-	ctx->commits.alloc = count_distinct;
-	ALLOC_ARRAY(ctx->commits.list, ctx->commits.alloc);
-
 	copy_oids_to_commits(ctx);
 
 	if (ctx->commits.nr >= GRAPH_EDGE_LAST_MASK) {