Message ID | b31c60d71266c40ae23a619d5ac5fd99148e1649.1596480582.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | more miscellaneous Bloom filter improvements | expand |
On 8/3/2020 2:57 PM, Taylor Blau wrote: > For locality, 'compute_bloom_filters()' sorts the commits for which it > wants to compute Bloom filters in a preferred order (cf., 3d11275505 > (commit-graph: examine commits by generation number, 2020-03-30) for > details). > > The subsequent patch will want to recover the new graph position of each > commit. Since the 'packed_commit_list' already stores a double-pointer, > avoid a 'COPY_ARRAY' and instead keep track of an index into the > original list. (Use an integer index instead of a memory address, since > this involves a needlessly confusing triple-pointer). It took me a little while to grok that we are switching from sorting a list of commit pointers to sorting a list of integers. However, that makes a lot of sense. It preserves the commit list sorted by OID for binary search, which you will need soon. Perhaps another change would need that at another time, too. > Alter the two sorting routines 'commit_pos_cmp' and 'commit_gen_cmp' to > take into account the packed_commit_list they are sorting with respect > to. Since 'compute_bloom_filters()' is the only caller for each of those > comparison functions, no other call-sites need updating. Parsing the changes to these functions is the most complicated, because of the int-to-commit indirection. I think they are correct and as easy to read as possible. -Stolee
On Tue, Aug 04, 2020 at 08:31:53AM -0400, Derrick Stolee wrote: > On 8/3/2020 2:57 PM, Taylor Blau wrote: > > For locality, 'compute_bloom_filters()' sorts the commits for which it > > wants to compute Bloom filters in a preferred order (cf., 3d11275505 > > (commit-graph: examine commits by generation number, 2020-03-30) for > > details). > > > > The subsequent patch will want to recover the new graph position of each > > commit. Since the 'packed_commit_list' already stores a double-pointer, > > avoid a 'COPY_ARRAY' and instead keep track of an index into the > > original list. (Use an integer index instead of a memory address, since > > this involves a needlessly confusing triple-pointer). > > It took me a little while to grok that we are switching from sorting > a list of commit pointers to sorting a list of integers. However, that > makes a lot of sense. It preserves the commit list sorted by OID for > binary search, which you will need soon. Perhaps another change would > need that at another time, too. Yeah. I had to spend some additional time with this patch (at least back when it was written in terms of 'struct commit ***'s) to convince myself of its correctness, too. I think that this is ultimately the right thing, and that it is probably as simple as I can make it without refactoring the packed_commit_list, which I think is squarely outside the scope of this (already-large) series ;). > > Alter the two sorting routines 'commit_pos_cmp' and 'commit_gen_cmp' to > > take into account the packed_commit_list they are sorting with respect > > to. Since 'compute_bloom_filters()' is the only caller for each of those > > comparison functions, no other call-sites need updating. > > Parsing the changes to these functions is the most complicated, because > of the int-to-commit indirection. I think they are correct and as easy > to read as possible. > > -Stolee Thanks, Taylor
diff --git a/commit-graph.c b/commit-graph.c index cb9d7fea04..d6ea556649 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -79,10 +79,18 @@ static void set_commit_pos(struct repository *r, const struct object_id *oid) *commit_pos_at(&commit_pos, commit) = max_pos++; } -static int commit_pos_cmp(const void *va, const void *vb) +struct packed_commit_list { + struct commit **list; + int nr; + int alloc; +}; + +static int commit_pos_cmp(const void *va, const void *vb, void *ctx) { - const struct commit *a = *(const struct commit **)va; - const struct commit *b = *(const struct commit **)vb; + struct packed_commit_list *commits = ctx; + + const struct commit *a = commits->list[*(int *)va]; + const struct commit *b = commits->list[*(int *)vb]; return commit_pos_at(&commit_pos, a) - commit_pos_at(&commit_pos, b); } @@ -139,10 +147,12 @@ static struct commit_graph_data *commit_graph_data_at(const struct commit *c) return data; } -static int commit_gen_cmp(const void *va, const void *vb) +static int commit_gen_cmp(const void *va, const void *vb, void *ctx) { - const struct commit *a = *(const struct commit **)va; - const struct commit *b = *(const struct commit **)vb; + struct packed_commit_list *commits = ctx; + + const struct commit *a = commits->list[*(int *)va]; + const struct commit *b = commits->list[*(int *)vb]; uint32_t generation_a = commit_graph_generation(a); uint32_t generation_b = commit_graph_generation(b); @@ -923,12 +933,6 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit return get_commit_tree_in_graph_one(r, r->objects->commit_graph, c); } -struct packed_commit_list { - struct commit **list; - int nr; - int alloc; -}; - struct packed_oid_list { struct object_id *list; int nr; @@ -1389,7 +1393,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) { int i; struct progress *progress = NULL; - struct commit **sorted_commits; + int *sorted_commits; init_bloom_filters(); @@ -1399,15 +1403,15 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) ctx->commits.nr); ALLOC_ARRAY(sorted_commits, ctx->commits.nr); - COPY_ARRAY(sorted_commits, ctx->commits.list, ctx->commits.nr); - - if (ctx->order_by_pack) - QSORT(sorted_commits, ctx->commits.nr, commit_pos_cmp); - else - QSORT(sorted_commits, ctx->commits.nr, commit_gen_cmp); + for (i = 0; i < ctx->commits.nr; i++) + sorted_commits[i] = i; + QSORT_S(sorted_commits, ctx->commits.nr, + ctx->order_by_pack ? commit_pos_cmp : commit_gen_cmp, + &ctx->commits); for (i = 0; i < ctx->commits.nr; i++) { - struct commit *c = sorted_commits[i]; + int pos = sorted_commits[i]; + struct commit *c = ctx->commits.list[pos]; struct bloom_filter *filter = get_bloom_filter(ctx->r, c, 1); ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len; display_progress(progress, i + 1);
For locality, 'compute_bloom_filters()' sorts the commits for which it wants to compute Bloom filters in a preferred order (cf., 3d11275505 (commit-graph: examine commits by generation number, 2020-03-30) for details). The subsequent patch will want to recover the new graph position of each commit. Since the 'packed_commit_list' already stores a double-pointer, avoid a 'COPY_ARRAY' and instead keep track of an index into the original list. (Use an integer index instead of a memory address, since this involves a needlessly confusing triple-pointer). Alter the two sorting routines 'commit_pos_cmp' and 'commit_gen_cmp' to take into account the packed_commit_list they are sorting with respect to. Since 'compute_bloom_filters()' is the only caller for each of those comparison functions, no other call-sites need updating. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- commit-graph.c | 44 ++++++++++++++++++++++++-------------------- 1 file changed, 24 insertions(+), 20 deletions(-)