[06/10] commit-graph.c: sort index into commits list

Message ID	b31c60d71266c40ae23a619d5ac5fd99148e1649.1596480582.git.me@ttaylorr.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <SRS0=/7R8=BN=vger.kernel.org=git-owner@kernel.org> Date: Mon, 3 Aug 2020 14:57:30 -0400 From: Taylor Blau <me@ttaylorr.com> To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com Subject: [PATCH 06/10] commit-graph.c: sort index into commits list Message-ID: <b31c60d71266c40ae23a619d5ac5fd99148e1649.1596480582.git.me@ttaylorr.com> References: <cover.1596480582.git.me@ttaylorr.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <cover.1596480582.git.me@ttaylorr.com> Sender: git-owner@vger.kernel.org Precedence: bulk
Series	more miscellaneous Bloom filter improvements \| expand [00/10] more miscellaneous Bloom filter improvements [01/10] commit-graph: introduce 'get_bloom_filter_settings()' [02/10] commit-graph: pass a 'struct repository *' in more places [03/10] t4216: use an '&&'-chain [04/10] t/helper/test-read-graph.c: prepare repo settings [05/10] commit-graph: respect 'commitgraph.readChangedPaths' [06/10] commit-graph.c: sort index into commits list [07/10] commit-graph: add large-filters bitmap chunk [08/10] bloom: split 'get_bloom_filter()' in two [09/10] commit-graph: rename 'split_commit_graph_opts' [10/10] builtin/commit-graph.c: introduce '--max-new-filters=<n>'

Message ID

b31c60d71266c40ae23a619d5ac5fd99148e1649.1596480582.git.me@ttaylorr.com (mailing list archive)

State

Superseded

Headers

Date: Mon, 3 Aug 2020 14:57:30 -0400
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: peff@peff.net, dstolee@microsoft.com
Subject: [PATCH 06/10] commit-graph.c: sort index into commits list
Message-ID: 
 <b31c60d71266c40ae23a619d5ac5fd99148e1649.1596480582.git.me@ttaylorr.com>
References: <cover.1596480582.git.me@ttaylorr.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <cover.1596480582.git.me@ttaylorr.com>
Sender: git-owner@vger.kernel.org
Precedence: bulk

Series

more miscellaneous Bloom filter improvements | expand

Commit Message

Taylor Blau Aug. 3, 2020, 6:57 p.m. UTC

For locality, 'compute_bloom_filters()' sorts the commits for which it
wants to compute Bloom filters in a preferred order (cf., 3d11275505
(commit-graph: examine commits by generation number, 2020-03-30) for
details).

The subsequent patch will want to recover the new graph position of each
commit. Since the 'packed_commit_list' already stores a double-pointer,
avoid a 'COPY_ARRAY' and instead keep track of an index into the
original list. (Use an integer index instead of a memory address, since
this involves a needlessly confusing triple-pointer).

Alter the two sorting routines 'commit_pos_cmp' and 'commit_gen_cmp' to
take into account the packed_commit_list they are sorting with respect
to. Since 'compute_bloom_filters()' is the only caller for each of those
comparison functions, no other call-sites need updating.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c | 44 ++++++++++++++++++++++++--------------------
 1 file changed, 24 insertions(+), 20 deletions(-)

Comments

Derrick Stolee Aug. 4, 2020, 12:31 p.m. UTC | #1

On 8/3/2020 2:57 PM, Taylor Blau wrote:
> For locality, 'compute_bloom_filters()' sorts the commits for which it
> wants to compute Bloom filters in a preferred order (cf., 3d11275505
> (commit-graph: examine commits by generation number, 2020-03-30) for
> details).
> 
> The subsequent patch will want to recover the new graph position of each
> commit. Since the 'packed_commit_list' already stores a double-pointer,
> avoid a 'COPY_ARRAY' and instead keep track of an index into the
> original list. (Use an integer index instead of a memory address, since
> this involves a needlessly confusing triple-pointer).

It took me a little while to grok that we are switching from sorting
a list of commit pointers to sorting a list of integers. However, that
makes a lot of sense. It preserves the commit list sorted by OID for
binary search, which you will need soon. Perhaps another change would
need that at another time, too.

> Alter the two sorting routines 'commit_pos_cmp' and 'commit_gen_cmp' to
> take into account the packed_commit_list they are sorting with respect
> to. Since 'compute_bloom_filters()' is the only caller for each of those
> comparison functions, no other call-sites need updating.

Parsing the changes to these functions is the most complicated, because
of the int-to-commit indirection. I think they are correct and as easy
to read as possible.

-Stolee

Taylor Blau Aug. 4, 2020, 8:10 p.m. UTC | #2

On Tue, Aug 04, 2020 at 08:31:53AM -0400, Derrick Stolee wrote:
> On 8/3/2020 2:57 PM, Taylor Blau wrote:
> > For locality, 'compute_bloom_filters()' sorts the commits for which it
> > wants to compute Bloom filters in a preferred order (cf., 3d11275505
> > (commit-graph: examine commits by generation number, 2020-03-30) for
> > details).
> >
> > The subsequent patch will want to recover the new graph position of each
> > commit. Since the 'packed_commit_list' already stores a double-pointer,
> > avoid a 'COPY_ARRAY' and instead keep track of an index into the
> > original list. (Use an integer index instead of a memory address, since
> > this involves a needlessly confusing triple-pointer).
>
> It took me a little while to grok that we are switching from sorting
> a list of commit pointers to sorting a list of integers. However, that
> makes a lot of sense. It preserves the commit list sorted by OID for
> binary search, which you will need soon. Perhaps another change would
> need that at another time, too.

Yeah. I had to spend some additional time with this patch (at least back
when it was written in terms of 'struct commit ***'s) to convince myself
of its correctness, too.

I think that this is ultimately the right thing, and that it is probably
as simple as I can make it without refactoring the packed_commit_list,
which I think is squarely outside the scope of this (already-large)
series ;).

> > Alter the two sorting routines 'commit_pos_cmp' and 'commit_gen_cmp' to
> > take into account the packed_commit_list they are sorting with respect
> > to. Since 'compute_bloom_filters()' is the only caller for each of those
> > comparison functions, no other call-sites need updating.
>
> Parsing the changes to these functions is the most complicated, because
> of the int-to-commit indirection. I think they are correct and as easy
> to read as possible.
>
> -Stolee

Thanks,
Taylor

diff --git a/commit-graph.c b/commit-graph.c
index cb9d7fea04..d6ea556649 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -79,10 +79,18 @@  static void set_commit_pos(struct repository *r, const struct object_id *oid)
 	*commit_pos_at(&commit_pos, commit) = max_pos++;
 }
 
-static int commit_pos_cmp(const void *va, const void *vb)
+struct packed_commit_list {
+	struct commit **list;
+	int nr;
+	int alloc;
+};
+
+static int commit_pos_cmp(const void *va, const void *vb, void *ctx)
 {
-	const struct commit *a = *(const struct commit **)va;
-	const struct commit *b = *(const struct commit **)vb;
+	struct packed_commit_list *commits = ctx;
+
+	const struct commit *a = commits->list[*(int *)va];
+	const struct commit *b = commits->list[*(int *)vb];
 	return commit_pos_at(&commit_pos, a) -
 	       commit_pos_at(&commit_pos, b);
 }
@@ -139,10 +147,12 @@  static struct commit_graph_data *commit_graph_data_at(const struct commit *c)
 	return data;
 }
 
-static int commit_gen_cmp(const void *va, const void *vb)
+static int commit_gen_cmp(const void *va, const void *vb, void *ctx)
 {
-	const struct commit *a = *(const struct commit **)va;
-	const struct commit *b = *(const struct commit **)vb;
+	struct packed_commit_list *commits = ctx;
+
+	const struct commit *a = commits->list[*(int *)va];
+	const struct commit *b = commits->list[*(int *)vb];
 
 	uint32_t generation_a = commit_graph_generation(a);
 	uint32_t generation_b = commit_graph_generation(b);
@@ -923,12 +933,6 @@  struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit
 	return get_commit_tree_in_graph_one(r, r->objects->commit_graph, c);
 }
 
-struct packed_commit_list {
-	struct commit **list;
-	int nr;
-	int alloc;
-};
-
 struct packed_oid_list {
 	struct object_id *list;
 	int nr;
@@ -1389,7 +1393,7 @@  static void compute_bloom_filters(struct write_commit_graph_context *ctx)
 {
 	int i;
 	struct progress *progress = NULL;
-	struct commit **sorted_commits;
+	int *sorted_commits;
 
 	init_bloom_filters();
 
@@ -1399,15 +1403,15 @@  static void compute_bloom_filters(struct write_commit_graph_context *ctx)
 			ctx->commits.nr);
 
 	ALLOC_ARRAY(sorted_commits, ctx->commits.nr);
-	COPY_ARRAY(sorted_commits, ctx->commits.list, ctx->commits.nr);
-
-	if (ctx->order_by_pack)
-		QSORT(sorted_commits, ctx->commits.nr, commit_pos_cmp);
-	else
-		QSORT(sorted_commits, ctx->commits.nr, commit_gen_cmp);
+	for (i = 0; i < ctx->commits.nr; i++)
+		sorted_commits[i] = i;
+	QSORT_S(sorted_commits, ctx->commits.nr,
+		ctx->order_by_pack ? commit_pos_cmp : commit_gen_cmp,
+		&ctx->commits);
 
 	for (i = 0; i < ctx->commits.nr; i++) {
-		struct commit *c = sorted_commits[i];
+		int pos = sorted_commits[i];
+		struct commit *c = ctx->commits.list[pos];
 		struct bloom_filter *filter = get_bloom_filter(ctx->r, c, 1);
 		ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len;
 		display_progress(progress, i + 1);

[06/10] commit-graph.c: sort index into commits list

Commit Message

Comments

Patch