Message ID | b8c55ecf88d6229f13e05e8369adaf9e70ae1de0.1678111599.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ahead-behind: new builtin for counting multiple commit ranges | expand |
On Mon, Mar 06, 2023 at 02:06:37PM +0000, Derrick Stolee via GitGitGadget wrote: > Signed-off-by: Derrick Stolee <derrickstolee@github.com> Having read and worked with this code before, I don't have a ton of substance to add here. But it was interesting to reread, and I left a few sprinklings here and there of some thing that we may want to consider for v2. Before that, though, IIRC we wrote most of this together, so I would be happy to have my: Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> above your S-o-b here. But you've done so much work since we originally wrote this together that I don't mind being dropped here. Up to you :-). > @@ -71,5 +76,23 @@ int cmd_ahead_behind(int argc, const char **argv, const char *prefix) > if (!tips.nr) > return 0; > > + ALLOC_ARRAY(commits, tips.nr + 1); > + ALLOC_ARRAY(counts, tips.nr); > + > + for (i = 0; i < tips.nr; i++) { > + commits[i] = tips.items[i].util; > + counts[i].tip_index = i; > + counts[i].base_index = tips.nr; > + } > + commits[tips.nr] = base; > + > + ahead_behind(commits, tips.nr + 1, counts, tips.nr); > + > + for (i = 0; i < tips.nr; i++) > + printf("%s %d %d\n", tips.items[i].string, > + counts[i].ahead, counts[i].behind); > + > + free(counts); > + free(commits); > return 0; > } I have to say, the interface looks particularly well designed when you see the patches come together in this fashion. The builtin is doing basically no work except collating the user's input, passing it off to ahead_behind(), and then spitting out the results. Very nice ;-). > diff --git a/commit-reach.c b/commit-reach.c > index 2e33c599a82..87ccc2cd4f5 100644 > --- a/commit-reach.c > +++ b/commit-reach.c > @@ -8,6 +8,7 @@ > #include "revision.h" > #include "tag.h" > #include "commit-reach.h" > +#include "ewah/ewok.h" > > /* Remember to update object flag allocation in object.h */ There is a new use of PARENT2 (which we hardcode here as bit 17) below, but it is already covered as part of the object flag allocation table in object.h. So this comment has done its job over the years ;-). > #define PARENT1 (1u<<16) > @@ -941,3 +942,97 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, > > return found_commits; > } > + > +define_commit_slab(bit_arrays, struct bitmap *); > +static struct bit_arrays bit_arrays; > + > +static void insert_no_dup(struct prio_queue *queue, struct commit *c) > +{ > + if (c->object.flags & PARENT2) > + return; > + prio_queue_put(queue, c); > + c->object.flags |= PARENT2; > +} You mentioned this in the patch message, but: It may be worth noting here (or in the call to repo_clear_commit_marks() below) that the PARENT2 flag is used to detect and avoid duplicates in this list. > +static struct bitmap *init_bit_array(struct commit *c, int width) > +{ > + struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c); > + if (!*bitmap) > + *bitmap = bitmap_word_alloc(width); > + return *bitmap; > +} > + > +static void free_bit_array(struct commit *c) > +{ > + struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c); > + if (!*bitmap) > + return; > + bitmap_free(*bitmap); > + *bitmap = NULL; > +} > + > +void ahead_behind(struct commit **commits, size_t commits_nr, > + struct ahead_behind_count *counts, size_t counts_nr) > +{ > + struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; > + size_t width = (commits_nr + BITS_IN_EWORD - 1) / BITS_IN_EWORD; > + size_t i; > + > + if (!commits_nr || !counts_nr) > + return; > + > + for (i = 0; i < counts_nr; i++) { > + counts[i].ahead = 0; > + counts[i].behind = 0; > + } > + > + ensure_generations_valid(commits, commits_nr); > + > + init_bit_arrays(&bit_arrays); > + > + for (i = 0; i < commits_nr; i++) { > + struct commit *c = commits[i]; > + struct bitmap *bitmap = init_bit_array(c, width); > + > + bitmap_set(bitmap, i); > + insert_no_dup(&queue, c); > + } > + > + while (queue_has_nonstale(&queue)) { > + struct commit *c = prio_queue_get(&queue); > + struct commit_list *p; > + struct bitmap *bitmap_c = init_bit_array(c, width); > + > + for (i = 0; i < counts_nr; i++) { > + int reach_from_tip = bitmap_get(bitmap_c, counts[i].tip_index); > + int reach_from_base = bitmap_get(bitmap_c, counts[i].base_index); Since we're XORing these, I'd hate to get bit by bitmap_get returning something other than 0 or 1. It doesn't, since the return value (for any "pos" for which it holds that `EWAH_BLOCKI(pos) < self->word_alloc`) is: (self->words[EWAH_BLOCK(pos)] & EWAH_MASK(pos)) != 0 so we'll always be guaranteed to zero or one. But if we retuned instead: self->words[EWAH_BLOCK(pos)] & EWAH_MASK(pos) ...this code would break in a very annoying and hard-to-debug way ;-). I wonder if we might do a little of belt-and-suspenders here by calling these like: int reach_from_tip = !!(bitmap_get(bitmap_c, counts[i].tip_index)); int reach_from_base = !!(bitmap_get(bitmap_c, counts[i].base_index)); where the "!!(...)" is new. > + if (reach_from_tip ^ reach_from_base) { > + if (reach_from_base) > + counts[i].behind++; > + else > + counts[i].ahead++; > + } > + } I have gone back and forth so many times on this code :-). I think the XORs are fine, though. > + for (p = c->parents; p; p = p->next) { > + struct bitmap *bitmap_p; > + > + parse_commit(p->item); > + > + bitmap_p = init_bit_array(p->item, width); > + bitmap_or(bitmap_p, bitmap_c); > + > + if (bitmap_popcount(bitmap_p) == commits_nr) > + p->item->object.flags |= STALE; > + > + insert_no_dup(&queue, p->item); Do we care about inserting p->item when the above condition is met? IOW, would it be OK to instead write: if (bitmap_popcount(bitmap_p) == commits_nr) p->item->object.flags |= STALE; else insert_no_dup(&queue, p->item); > diff --git a/commit-reach.h b/commit-reach.h > index 148b56fea50..1780f9317bf 100644 > --- a/commit-reach.h > +++ b/commit-reach.h > @@ -104,4 +104,34 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, > struct commit **to, int nr_to, > unsigned int reachable_flag); > > +struct ahead_behind_count { > + /** > + * As input, the *_index members indicate which positions in > + * the 'tips' array correspond to the tip and base of this > + * comparison. > + */ > + size_t tip_index; > + size_t base_index; > + > + /** > + * These values store the computed counts for each side of the > + * symmetric difference: > + * > + * 'ahead' stores the number of commits reachable from the tip > + * and not reachable from the base. > + * > + * 'behind' stores the number of commits reachable from the base > + * and not reachable from the tip. > + */ > + int ahead; > + int behind; > +}; Should these be unsigned values? I don't think we have a sensible interpretation for what a negative "ahead" or "behind" could would mean. I guess behind "behind" by "N" means you're "ahead" by "-N", but I don't think it's practical ;-). > + > +/** Here and elsewhere, these kind of doc-comments are a little non-standard, and IIRC the opening should instead be "/*" (with one asterisk instead of two). Thanks, Taylor
On 3/6/2023 8:05 PM, Taylor Blau wrote: > On Mon, Mar 06, 2023 at 02:06:37PM +0000, Derrick Stolee via GitGitGadget wrote: >> Signed-off-by: Derrick Stolee <derrickstolee@github.com> > > Having read and worked with this code before, I don't have a ton of > substance to add here. But it was interesting to reread, and I left a > few sprinklings here and there of some thing that we may want to > consider for v2. > > Before that, though, IIRC we wrote most of this together, so I would be > happy to have my: > > Co-authored-by: Taylor Blau <me@ttaylorr.com> > Signed-off-by: Taylor Blau <me@ttaylorr.com> > > above your S-o-b here. But you've done so much work since we originally > wrote this together that I don't mind being dropped here. Up to you :-). Sounds good. Sorry for forgetting that collaboration. >> +static void insert_no_dup(struct prio_queue *queue, struct commit *c) >> +{ >> + if (c->object.flags & PARENT2) >> + return; >> + prio_queue_put(queue, c); >> + c->object.flags |= PARENT2; >> +} > > You mentioned this in the patch message, but: > > It may be worth noting here (or in the call to repo_clear_commit_marks() > below) that the PARENT2 flag is used to detect and avoid duplicates in > this list. I'll add a comment before we clear the bits, to be clear about why we need each bit. >> + while (queue_has_nonstale(&queue)) { >> + struct commit *c = prio_queue_get(&queue); >> + struct commit_list *p; >> + struct bitmap *bitmap_c = init_bit_array(c, width); >> + >> + for (i = 0; i < counts_nr; i++) { >> + int reach_from_tip = bitmap_get(bitmap_c, counts[i].tip_index); >> + int reach_from_base = bitmap_get(bitmap_c, counts[i].base_index); > > Since we're XORing these, I'd hate to get bit by bitmap_get returning > something other than 0 or 1. It doesn't, since the return value (for any > "pos" for which it holds that `EWAH_BLOCKI(pos) < self->word_alloc`) is: > I wonder if we might do a little of belt-and-suspenders here by calling > these like: > > int reach_from_tip = !!(bitmap_get(bitmap_c, counts[i].tip_index)); > int reach_from_base = !!(bitmap_get(bitmap_c, counts[i].base_index)); > > where the "!!(...)" is new. Can't hurt. >> + if (bitmap_popcount(bitmap_p) == commits_nr) >> + p->item->object.flags |= STALE; >> + >> + insert_no_dup(&queue, p->item); > > Do we care about inserting p->item when the above condition is met? IOW, > would it be OK to instead write: > > if (bitmap_popcount(bitmap_p) == commits_nr) > p->item->object.flags |= STALE; > else > insert_no_dup(&queue, p->item); We need to push p->item to the queue, even if stale, because it may need to be walked in order to pass the bitmaps (and the STALE bit) to other commits that were reached by only a subset of the tips. Here's an example: A B |/ \ C D | / E / |/ F If A and B are the starting commits, then C is stale when we walk it, so its parent E would be stale. Your proposed change would not add it to the queue, and thus F would never see that it is stale and would be counted as reachable from B but not A. >> diff --git a/commit-reach.h b/commit-reach.h >> index 148b56fea50..1780f9317bf 100644 >> --- a/commit-reach.h >> +++ b/commit-reach.h >> @@ -104,4 +104,34 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, >> struct commit **to, int nr_to, >> unsigned int reachable_flag); >> >> +struct ahead_behind_count { >> + /** >> + * As input, the *_index members indicate which positions in >> + * the 'tips' array correspond to the tip and base of this >> + * comparison. >> + */ >> + size_t tip_index; >> + size_t base_index; >> + >> + /** >> + * These values store the computed counts for each side of the >> + * symmetric difference: >> + * >> + * 'ahead' stores the number of commits reachable from the tip >> + * and not reachable from the base. >> + * >> + * 'behind' stores the number of commits reachable from the base >> + * and not reachable from the tip. >> + */ >> + int ahead; >> + int behind; >> +}; > > Should these be unsigned values? I don't think we have a sensible > interpretation for what a negative "ahead" or "behind" could would mean. > I guess behind "behind" by "N" means you're "ahead" by "-N", but I don't > think it's practical ;-). Unsigned sounds good. >> + >> +/** > > Here and elsewhere, these kind of doc-comments are a little > non-standard, and IIRC the opening should instead be "/*" (with one > asterisk instead of two). I think double-asterisk is the preferred choice for new things, but commit-reach.h only uses a single asterisk so I'll change this to be consistent. Thanks, -Stolee
diff --git a/builtin/ahead-behind.c b/builtin/ahead-behind.c index e4f65fc0548..c06b95b5f37 100644 --- a/builtin/ahead-behind.c +++ b/builtin/ahead-behind.c @@ -2,6 +2,7 @@ #include "parse-options.h" #include "config.h" #include "commit.h" +#include "commit-reach.h" static const char * const ahead_behind_usage[] = { N_("git ahead-behind --base=<ref> [ --stdin | <revs> ]"), @@ -29,8 +30,12 @@ static int handle_arg(struct string_list *tips, const char *arg) int cmd_ahead_behind(int argc, const char **argv, const char *prefix) { const char *base_ref = NULL; + struct commit *base; int from_stdin = 0; struct string_list tips = STRING_LIST_INIT_DUP; + struct commit **commits; + struct ahead_behind_count *counts; + size_t i; struct option ahead_behind_opts[] = { OPT_STRING('b', "base", &base_ref, N_("base"), N_("base reference to process")), @@ -71,5 +76,23 @@ int cmd_ahead_behind(int argc, const char **argv, const char *prefix) if (!tips.nr) return 0; + ALLOC_ARRAY(commits, tips.nr + 1); + ALLOC_ARRAY(counts, tips.nr); + + for (i = 0; i < tips.nr; i++) { + commits[i] = tips.items[i].util; + counts[i].tip_index = i; + counts[i].base_index = tips.nr; + } + commits[tips.nr] = base; + + ahead_behind(commits, tips.nr + 1, counts, tips.nr); + + for (i = 0; i < tips.nr; i++) + printf("%s %d %d\n", tips.items[i].string, + counts[i].ahead, counts[i].behind); + + free(counts); + free(commits); return 0; } diff --git a/commit-reach.c b/commit-reach.c index 2e33c599a82..87ccc2cd4f5 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -8,6 +8,7 @@ #include "revision.h" #include "tag.h" #include "commit-reach.h" +#include "ewah/ewok.h" /* Remember to update object flag allocation in object.h */ #define PARENT1 (1u<<16) @@ -941,3 +942,97 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, return found_commits; } + +define_commit_slab(bit_arrays, struct bitmap *); +static struct bit_arrays bit_arrays; + +static void insert_no_dup(struct prio_queue *queue, struct commit *c) +{ + if (c->object.flags & PARENT2) + return; + prio_queue_put(queue, c); + c->object.flags |= PARENT2; +} + +static struct bitmap *init_bit_array(struct commit *c, int width) +{ + struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c); + if (!*bitmap) + *bitmap = bitmap_word_alloc(width); + return *bitmap; +} + +static void free_bit_array(struct commit *c) +{ + struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c); + if (!*bitmap) + return; + bitmap_free(*bitmap); + *bitmap = NULL; +} + +void ahead_behind(struct commit **commits, size_t commits_nr, + struct ahead_behind_count *counts, size_t counts_nr) +{ + struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; + size_t width = (commits_nr + BITS_IN_EWORD - 1) / BITS_IN_EWORD; + size_t i; + + if (!commits_nr || !counts_nr) + return; + + for (i = 0; i < counts_nr; i++) { + counts[i].ahead = 0; + counts[i].behind = 0; + } + + ensure_generations_valid(commits, commits_nr); + + init_bit_arrays(&bit_arrays); + + for (i = 0; i < commits_nr; i++) { + struct commit *c = commits[i]; + struct bitmap *bitmap = init_bit_array(c, width); + + bitmap_set(bitmap, i); + insert_no_dup(&queue, c); + } + + while (queue_has_nonstale(&queue)) { + struct commit *c = prio_queue_get(&queue); + struct commit_list *p; + struct bitmap *bitmap_c = init_bit_array(c, width); + + for (i = 0; i < counts_nr; i++) { + int reach_from_tip = bitmap_get(bitmap_c, counts[i].tip_index); + int reach_from_base = bitmap_get(bitmap_c, counts[i].base_index); + + if (reach_from_tip ^ reach_from_base) { + if (reach_from_base) + counts[i].behind++; + else + counts[i].ahead++; + } + } + + for (p = c->parents; p; p = p->next) { + struct bitmap *bitmap_p; + + parse_commit(p->item); + + bitmap_p = init_bit_array(p->item, width); + bitmap_or(bitmap_p, bitmap_c); + + if (bitmap_popcount(bitmap_p) == commits_nr) + p->item->object.flags |= STALE; + + insert_no_dup(&queue, p->item); + } + + free_bit_array(c); + } + + repo_clear_commit_marks(the_repository, PARENT2 | STALE); + clear_bit_arrays(&bit_arrays); + clear_prio_queue(&queue); +} diff --git a/commit-reach.h b/commit-reach.h index 148b56fea50..1780f9317bf 100644 --- a/commit-reach.h +++ b/commit-reach.h @@ -104,4 +104,34 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, struct commit **to, int nr_to, unsigned int reachable_flag); +struct ahead_behind_count { + /** + * As input, the *_index members indicate which positions in + * the 'tips' array correspond to the tip and base of this + * comparison. + */ + size_t tip_index; + size_t base_index; + + /** + * These values store the computed counts for each side of the + * symmetric difference: + * + * 'ahead' stores the number of commits reachable from the tip + * and not reachable from the base. + * + * 'behind' stores the number of commits reachable from the base + * and not reachable from the tip. + */ + int ahead; + int behind; +}; + +/** + * Given an array of commits and an array of ahead_behind_count pairs, + * compute the ahead/behind counts for each pair. + */ +void ahead_behind(struct commit **commits, size_t commits_nr, + struct ahead_behind_count *counts, size_t counts_nr); + #endif diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh new file mode 100755 index 00000000000..c9ac4b7e6e2 --- /dev/null +++ b/t/perf/p1500-graph-walks.sh @@ -0,0 +1,25 @@ +#!/bin/sh + +test_description='Commit walk performance tests' +. ./perf-lib.sh + +test_perf_large_repo + +test_expect_success 'setup' ' + git for-each-ref --format="%(refname)" "refs/heads/*" "refs/tags/*" >allrefs && + sort -r allrefs | head -n 50 >refs && + git commit-graph write --reachable +' + +test_perf 'ahead-behind counts: git ahead-behind' ' + git ahead-behind --base=HEAD --stdin <refs +' + +test_perf 'ahead-behind counts: git rev-list' ' + for r in $(cat refs) + do + git rev-list --count "HEAD..$r" || return 1 + done +' + +test_done diff --git a/t/t4218-ahead-behind.sh b/t/t4218-ahead-behind.sh index 56f16515896..6658c919fdf 100755 --- a/t/t4218-ahead-behind.sh +++ b/t/t4218-ahead-behind.sh @@ -4,6 +4,16 @@ test_description='git ahead-behind command-line options' . ./test-lib.sh +test_expect_success 'setup simple history' ' + test_commit base && + git checkout -b right && + test_commit right && + git checkout -b left base && + test_commit left && + git checkout -b merge && + git merge right -m "merge" +' + test_expect_success 'git ahead-behind -h' ' test_must_fail git ahead-behind -h >out && grep "usage:" out @@ -14,6 +24,11 @@ test_expect_success 'git ahead-behind without --base' ' grep "usage:" err ' +test_expect_success 'git ahead-behind with broken --base' ' + test_must_fail git ahead-behind --base=bogus HEAD 2>err && + grep "could not resolve '\''bogus'\''" err +' + test_expect_success 'git ahead-behind with broken tip' ' test_must_fail git ahead-behind --base=HEAD bogus 2>err && grep "could not resolve '\''bogus'\''" err @@ -30,4 +45,56 @@ test_expect_success 'git ahead-behind without tips' ' test_must_be_empty err ' +test_expect_success 'git ahead-behind --base=base' ' + git ahead-behind --base=base base left right merge >actual && + + cat >expect <<-EOF && + base 0 0 + left 1 0 + right 1 0 + merge 3 0 + EOF + + test_cmp expect actual +' + +test_expect_success 'git ahead-behind --base=left' ' + git ahead-behind --base=left base left right merge >actual && + + cat >expect <<-EOF && + base 0 1 + left 0 0 + right 1 1 + merge 2 0 + EOF + + test_cmp expect actual +' + +test_expect_success 'git ahead-behind --base=right' ' + git ahead-behind --base=right base left right merge >actual && + + cat >expect <<-EOF && + base 0 1 + left 1 1 + right 0 0 + merge 2 0 + EOF + + test_cmp expect actual +' + +test_expect_success 'git ahead-behind --base=merge' ' + git ahead-behind --base=merge base left right merge >actual && + + cat >expect <<-EOF && + base 0 3 + left 0 2 + right 0 2 + merge 0 0 + EOF + + test_cmp expect actual +' + test_done diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index 338a9c46a24..951e07100f6 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -443,4 +443,66 @@ test_expect_success 'get_reachable_subset:none' ' test_all_modes get_reachable_subset ' +test_expect_success 'ahead-behind:linear' ' + cat >input <<-\EOF && + commit-1-1 + commit-1-3 + commit-1-5 + commit-1-8 + EOF + cat >expect <<-\EOF && + commit-1-1 0 8 + commit-1-3 0 6 + commit-1-5 0 4 + commit-1-8 0 1 + EOF + run_all_modes git ahead-behind --base=commit-1-9 --stdin +' + +test_expect_success 'ahead-behind:all' ' + cat >input <<-\EOF && + commit-1-1 + commit-2-4 + commit-4-2 + commit-4-4 + EOF + cat >expect <<-\EOF && + commit-1-1 0 24 + commit-2-4 0 17 + commit-4-2 0 17 + commit-4-4 0 9 + EOF + run_all_modes git ahead-behind --base=commit-5-5 --stdin +' + +test_expect_success 'ahead-behind:some' ' + cat >input <<-\EOF && + commit-1-1 + commit-5-3 + commit-4-8 + commit-9-9 + EOF + cat >expect <<-\EOF && + commit-1-1 0 53 + commit-5-3 0 39 + commit-4-8 8 30 + commit-9-9 27 0 + EOF + run_all_modes git ahead-behind --base=commit-9-6 --stdin +' + +test_expect_success 'ahead-behind:none' ' + cat >input <<-\EOF && + commit-7-5 + commit-4-8 + commit-9-9 + EOF + cat >expect <<-\EOF && + commit-7-5 7 4 + commit-4-8 16 16 + commit-9-9 49 0 + EOF + run_all_modes git ahead-behind --base=commit-8-4 --stdin +' + test_done