Message ID | 0fb3913810b7d4731707a0129df7f5d57c8ab39f.1678902343.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ref-filter: ahead/behind counting, faster --merged option | expand |
First of all, thanks to Taylor and Stolee for this algorithm and code - it is straightforwardly written and looks correct to me. I have some commit message and code comment suggestions that if taken, would have helped me on my first reading, but these are subjective so feel free to ignore them if you think they would add unnecessary detail (I did understand the algorithm in the end, after all). "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > The second array, using the > new ahead_behind_count struct, indicates which commits from that initial > array form the base/tip pair for the ahead/behind count it will store. I would have preferred: The second array contains base/tip pairs designating pairs of commits for which ahead/behind counts need to be computed, each pair being a pair of indexes into the first array. > Each commit in the input commit list is associated with a bit position > indicating "the ith commit can reach this commit". Each of these commits > is associated with a bitmap with its position flipped on and then > placed in a queue for walking commit history. "this commit" is not necessarily a commit in the input commit list (it is actually the commit that we're currently at in our iteration) and I think that the association of bitmaps with commits in the input commit list could be more clearly described. So I would have preferred: Each commit in the priority queue is associated with a bitmap of width N (N being the count of commits in the first array), in which a bit is set iff the commit can be reached by the corresponding commit in the first array. This is different from packfile or MIDX bitmaps in that a commit's bitmap stores what can reach it, not what it can reach. The priority queue is initialized with N commits, each commit being associated with a bitmap in which a single bit is set (indicating that the commit can be reached by itself). > +void ahead_behind(struct repository *r, > + struct commit **commits, size_t commits_nr, > + struct ahead_behind_count *counts, size_t counts_nr) > +{ > + struct prio_queue queue = { .compare = compare_commits_by_gen_then_commit_date }; > + size_t width = (commits_nr + BITS_IN_EWORD - 1) / BITS_IN_EWORD; As we discussed in our Review Club, DIV_ROUND_UP can be used for this. (For those reading who do not know what Review Club is, search the archives and/or look out for future announcements!) > + if (bitmap_popcount(bitmap_p) == commits_nr) > + p->item->object.flags |= STALE; Might be worth adding a comment above the STALE line: this parent commit and all its ancestors can be reached by every commit in the commits list and thus can never be "ahead" or "behind" in any pair; mark this STALE so that, as an optimization, we can stop iteration if only STALE commits remain (since further iteration would never change any "ahead" or "behind" value).
On 3/15/2023 7:28 PM, Jonathan Tan wrote: > First of all, thanks to Taylor and Stolee for this algorithm and code > - it is straightforwardly written and looks correct to me. I have some > commit message and code comment suggestions that if taken, would have > helped me on my first reading, but these are subjective so feel free > to ignore them if you think they would add unnecessary detail (I did > understand the algorithm in the end, after all). I appreciate your comments here. I've done some reworking of the message based on what you say here, as well as the verbal feedback from review club. >> +void ahead_behind(struct repository *r, >> + struct commit **commits, size_t commits_nr, >> + struct ahead_behind_count *counts, size_t counts_nr) >> +{ >> + struct prio_queue queue = { .compare = compare_commits_by_gen_then_commit_date }; >> + size_t width = (commits_nr + BITS_IN_EWORD - 1) / BITS_IN_EWORD; > > As we discussed in our Review Club, DIV_ROUND_UP can be used for this. Got it! >> + if (bitmap_popcount(bitmap_p) == commits_nr) >> + p->item->object.flags |= STALE; > > Might be worth adding a comment above the STALE line: this parent commit > and all its ancestors can be reached by every commit in the commits > list and thus can never be "ahead" or "behind" in any pair; mark this > STALE so that, as an optimization, we can stop iteration if only STALE > commits remain (since further iteration would never change any "ahead" > or "behind" value). This is a helpful thing to point out, so a comment is appropriate. Overall, maybe algorithms like this should have more inline comments than we typically expect in the Git codebase. We want to make sure that these things are readable in the future, hopefully without digging too far in the history to find the lengthy commit message about it. I'll delay sending v4 until giving a little time to hear back on this point. My default is to not add the comments, but I'd be happy to, if we think this is an appropriate time to deviate from the standard. Thanks, -Stolee
diff --git a/commit-reach.c b/commit-reach.c index 2e33c599a82..1e5a1c37fb7 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -8,6 +8,7 @@ #include "revision.h" #include "tag.h" #include "commit-reach.h" +#include "ewah/ewok.h" /* Remember to update object flag allocation in object.h */ #define PARENT1 (1u<<16) @@ -941,3 +942,98 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, return found_commits; } + +define_commit_slab(bit_arrays, struct bitmap *); +static struct bit_arrays bit_arrays; + +static void insert_no_dup(struct prio_queue *queue, struct commit *c) +{ + if (c->object.flags & PARENT2) + return; + prio_queue_put(queue, c); + c->object.flags |= PARENT2; +} + +static struct bitmap *init_bit_array(struct commit *c, int width) +{ + struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c); + if (!*bitmap) + *bitmap = bitmap_word_alloc(width); + return *bitmap; +} + +static void free_bit_array(struct commit *c) +{ + struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c); + if (!*bitmap) + return; + bitmap_free(*bitmap); + *bitmap = NULL; +} + +void ahead_behind(struct repository *r, + struct commit **commits, size_t commits_nr, + struct ahead_behind_count *counts, size_t counts_nr) +{ + struct prio_queue queue = { .compare = compare_commits_by_gen_then_commit_date }; + size_t width = (commits_nr + BITS_IN_EWORD - 1) / BITS_IN_EWORD; + + if (!commits_nr || !counts_nr) + return; + + for (size_t i = 0; i < counts_nr; i++) { + counts[i].ahead = 0; + counts[i].behind = 0; + } + + ensure_generations_valid(r, commits, commits_nr); + + init_bit_arrays(&bit_arrays); + + for (size_t i = 0; i < commits_nr; i++) { + struct commit *c = commits[i]; + struct bitmap *bitmap = init_bit_array(c, width); + + bitmap_set(bitmap, i); + insert_no_dup(&queue, c); + } + + while (queue_has_nonstale(&queue)) { + struct commit *c = prio_queue_get(&queue); + struct commit_list *p; + struct bitmap *bitmap_c = init_bit_array(c, width); + + for (size_t i = 0; i < counts_nr; i++) { + int reach_from_tip = !!bitmap_get(bitmap_c, counts[i].tip_index); + int reach_from_base = !!bitmap_get(bitmap_c, counts[i].base_index); + + if (reach_from_tip ^ reach_from_base) { + if (reach_from_base) + counts[i].behind++; + else + counts[i].ahead++; + } + } + + for (p = c->parents; p; p = p->next) { + struct bitmap *bitmap_p; + + repo_parse_commit(r, p->item); + + bitmap_p = init_bit_array(p->item, width); + bitmap_or(bitmap_p, bitmap_c); + + if (bitmap_popcount(bitmap_p) == commits_nr) + p->item->object.flags |= STALE; + + insert_no_dup(&queue, p->item); + } + + free_bit_array(c); + } + + /* STALE is used here, PARENT2 is used by insert_no_dup(). */ + repo_clear_commit_marks(r, PARENT2 | STALE); + clear_bit_arrays(&bit_arrays); + clear_prio_queue(&queue); +} diff --git a/commit-reach.h b/commit-reach.h index 148b56fea50..f708c46e523 100644 --- a/commit-reach.h +++ b/commit-reach.h @@ -104,4 +104,35 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from, struct commit **to, int nr_to, unsigned int reachable_flag); +struct ahead_behind_count { + /** + * As input, the *_index members indicate which positions in + * the 'tips' array correspond to the tip and base of this + * comparison. + */ + size_t tip_index; + size_t base_index; + + /** + * These values store the computed counts for each side of the + * symmetric difference: + * + * 'ahead' stores the number of commits reachable from the tip + * and not reachable from the base. + * + * 'behind' stores the number of commits reachable from the base + * and not reachable from the tip. + */ + unsigned int ahead; + unsigned int behind; +}; + +/* + * Given an array of commits and an array of ahead_behind_count pairs, + * compute the ahead/behind counts for each pair. + */ +void ahead_behind(struct repository *r, + struct commit **commits, size_t commits_nr, + struct ahead_behind_count *counts, size_t counts_nr); + #endif